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PREFACE TO THE SECOND EDITION 


With the first edition out of print, we decided to arrange for republi- 
cation of Denumerable Markov Chains with additional bibliographic 
material. The new edition contains a section Additional Notes that 
indicates some of the developments in Markov chain theory over the 
last ten years. As in the first edition and for the same reasons, we have 
resisted the temptation to follow the theory in directions that deal with 
uncountable state spaces or continuous time. A section entitled 
Additional References complements the Additional Notes. 

J. W. Pitman pointed out an error in Theorem 9-53 of the first 
edition, which we have corrected. More detail about the correction 
appears in the Additional Notes. Aside from this change, we have left 
intact the text of the first eleven chapters. 

The second edition contains a twelfth chapter, written by David 
Griffeath, on Markov random fields. We are grateful to Ted Cox for 
his help in preparing this material. Notes for the chapter appear in the 
section Additional Notes. 


J.G.K., J.L.S., A.W.K. 
March, 1976 


PREFACE TO THE FIRST EDITION 


Our purpose in writing this monograph has been to provide a syste- 
matic treatment of denumerable Markov chains, covering both the 
foundations of the subject and some topics in potential theory and 
boundary theory. Much of the material included is now available only 
in recent research papers. The book’s theme is a discussion of relations 
among what might be called the descriptive quantities associated with 
Markov chains—probabilities of events and means of random variables 
that give insight into the behavior of the chains. 

We make no pretense of being complete. Indeed, we have omitted 
many results which we feel are not directly related to the main theme, 
especially when they are available in easily accessible sources. Thus, 
for example, we have only touched on independent trials processes, 
sums of independent random variables, and limit theorems. On the other 
hand, we have made an attempt to see that the book is self-contained, 
in order that a mathematician can read it without continually referring 
to outside sources. It may therefore prove useful in graduate seminars. 

Denumerable Markov chains are in a peculiar position in that the 
methods of functional analysis which are used in handling more general 
chains apply only to a relatively small class of denumerable chains. In- 
stead, another approach has been necessary, and we have chosen to use 
infinite matrices. They simplify the notation, shorten statements and 
proofs of theorems, and often suggest new results. They also enable one 
to exploit the duality between measures and functions to the fullest. 

The monograph divides naturally into four parts, the first three con- 
sisting of three chapters each and the fourth containing the last two 
chapters. 

Part I provides background material for the theory of Markov chains. 
It is included to help make the book self-contained and should facilitate 
the use of the book in advanced seminars. Part II contains basic results 
on denumerable Markov chains, and Part III deals with discrete poten- 
tial theory. Part IV treats boundary theory for both transient and re- 
current chains. The analytical prerequisites for the two chapters in this 
last part exceed those for the earlier parts of the book and are not all 
included in Part I. Primarily, Part IV presumes that the reader is 
familiar with the topology and measure theory of compact metric 
spaces, in addition to the contents of Part I. 


vi 


vii Preface 


Two chapters—Chapters 1 and 7—require special comments. Chap- 
ter 1 contains prerequisites from the theory of infinite matrices and 
some other topics in analysis. In it Sections 1 and 5 are the most impor- 
tant for an understanding of the later chapters. Chapter 7, entitled 
“Introduction to Potential Theory,” is a chapter of motivation and should 
be read as such. Its intent is to point out why classical potential theory 
and Markov chains should be at all related. 

The book contains 239 problems, some at the end of each chapter 
except Chapters 1 and 7. 

For the most part, historical references do not appear in the text 
but are collected in one segment at the end of the book. 

Some remarks about notation may be helpful. We use sparingly 
the word “Theorem” to indicate the most significant results of the 
monograph; other results are labeled “Lemma,” “Proposition,” and 
“Corollary” in accordance with common usage. The end of each proof 
is indicated by a blank line. Several examples of Markov chains are 
worked out in detail and recur at intervals; although there is normally 
little interdependence between distinct examples, different instances of 
the same example may be expected to build on one another. 

A complete list of symbols used in the book appears in a list separate 
from the index. 

We wish to thank Susan Knapp for typing and proof-reading the 
manuscript. 

We are doubly indebted to the National Science Foundation: First, 
a number of original results and simplified proofs of known results were 
developed as part of a research project supported by the Foundation. 
And second, we are grateful for the support provided toward the 
preparation of this manuscript. 
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CHAPTER 1 


PREREQUISITES FROM ANALYSIS 


1. Denumerable matrices 


The word denumerable in the sequel means finite or countably 
infinite. Let M and N be two non-empty denumerable sets. A 
matrix is a function with domain the set of ordered pairs (m, n), where 
me M and ne N, and with range a subset of the extended real number 
system—the reals with +œ and —oo adjoined. We call the sets M 
and N index sets. The matrix is called a finite matrix if both M and N 
are finite sets. 

To say that the m-nth entry of the matrix is x or is equal to x, we 
mean that the value of the function on the pair (m, n) is x. A matrix 
is said to be non-negative if all of its entries are non-negative, and it is 
said to be positive if all of its entries are positive. We agree to use 
upper-case italic letters to stand for matrices. If A is a matrix, we 
denote the m-nth entry of A by A,,,. Some examples of matrices are 
as follows: 


(1) If all entries of a matrix are equal to zero, we say that the matrix 
is the zero matrix, denoted by 0. 

(2) A matrix for which M and N are the same set is called a square 
matrix. The entries corresponding to m = n are diagonal entries; other 
entries are off-diagonal entries. 

(3) A square matrix whose off-diagonal entries all equal zero is a 
diagonal matrix. The diagonal matrix obtained from a square matrix 
A by setting all of its off-diagonal entries equal to zero is denoted Aag. 

(4) A diagonal matrix whose diagonal entries are all equal to one is 
called the identity matrix, denoted by J. 

(5) A matrix whose second index set contains only one element is 
called a column vector. If we wish to distinguish a column vector from 
an arbitrary matrix, we shall denote the former by a lower-case italic 


letter. 
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(6) A matrix whose first index set contains only one element is called 
a row vector. If we wish to distinguish a row vector from an arbitrary 
matrix, we shall denote the former by a lower-case Greek letter. 

(7) If A is a matrix defined on index sets M and N, define a matrix 
AT, called the transpose of A, to have index sets N and M and to have 
entries given by (AT),m = Amn. The transpose of the transpose of A is 
simply A. 

(8) The column vector all of whose entries are equal to one is denoted 
1; the row vector with all entries one is 17. A matrix other than a row 
or column vector which has all entries equal to one is denoted by Æ. 

(9) If A is an arbitrary matrix and c is a real number, cA is the 
matrix whose entries are given by (CA) mz = CAmn- 

(10) The matrix — A is defined to be the matrix (—1)A. 

(11) A constant (column) vector is a vector of the form c1 for some 
extended real number c. 

(12) A bounded vector is a vector all of whose entries are less than or 
equal in absolute value to some finite real number c. 


Two matrices A and B are equal, written A = B, if they have the 
same index sets and if Amn = Bmn for every mand n. Inequalities are 
defined similarly. For example, A > B if A and B have the same 
index sets and if Amn > Bmn for every m and n. In particular, non- 
negative matrices are those for which A > 0, and positive matrices are 
those for which A > 0. 

Addition of matrices is defined for matrices A and B having the same 
index sets M and N. Their sum C = A + B has the same index sets, 
and addition is defined entry-by-entry: 


Cmn = Amn + Bigs 


ThesumC = A + Bis well defined if no entry of C is given by œ — œ 
or by —œ + œ. We leave the verification of the following properties 
of matrices with index sets M and N to the reader: 


(1) A + 0 = A for every A. 
(2) For every A having all entries finite, A + (—A) = 0. 
(3) For any matrices A, B, and C, 


A+(B4+C)=(4+ B)4+C 


if the indicated sums on at least one side of the equality are 
well defined. 


Up to now, we have imposed no orderings on our index sets, and in 
fact nothing we have done so far necessitates doing so. We shall define 
even matrix multiplication shortly in a way that requires no ordering. 
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There is, however, a standard way of representing matrices as rec- 
tangular arrays, and for this purpose one normally orders the index 
sets with the usual ordering on the non-negative integers. The 
elements of the index sets are thus numbered 0, 1, 2,... either up to 
some integer r if the index set is finite or indefinitely if the index set is 
infinite. Under such orderings of its index sets, a matrix A is repre- 
sented as 


Aoo Aoi Aoz 
A= Ajo Ay Ax 


Ago Agi Ago 


We note that other representations are possible if at least one of the 
index sets is infinite; such representations come from ordering the 
index sets with an order type other than that of the non-negative 
integers. We shall meet another order type with its corresponding 
representation at the end of this section. We point out, however, that 
orderings are completely irrelevant as far as the fundamental properties 
of matrices are concerned, and we shall have little occasion to refer to 
them again. 
For any real number a,,, define a,,t and a,,~ by 


Am” = Max (Am, 0) 


Am — min (a,,, 0). 

The sum of denumerably many non-negative terms Dmem@m* OF 
Smem 4m always exists independently of any ordering on M. There- 
fore, we say that Smem 4m = >mem Amt — mem m is well defined if 
not both of È mem Gm 1 and È mem Gm” are infinite. 


Definition 1-1: Let A be a matrix with index sets K and M, and let 

B be a matrix with index sets M and N. Suppose the sums 

> AnmBmn 

meM 
are well defined for every k and every n. Then the matrix product 
C = AB is said to be well defined; its index sets are K and N, and its 
entries are given by Opn = Smem AxmBmn Matrix multiplication is not 
defined unless all of these properties hold. 


Most of the propositions and theorems about matrices that we shall 
deal with are statements of equality of matrices A = B. Such state- 
ments are really just assertions about the equality of corresponding 
entries of A and B, and a proof that A equals B need only contain an 
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argument that an arbitrary entry of A equals the corresponding entry 
of B. With this understanding, we see that the proof of the additive 
properties of matrices is reduced to a trivial repetition of the properties 
of real numbers. Propositions about multiplication, however, when 
looked at entry-by-entry involve a new idea. 

Let A be a matrix with index sets M and N and let m and n be fixed 
elements of M and N, respectively. The mth row of A is defined to be 
the restriction of the function A to the domain of pairs (m, s), where s 
runs through the set N. Similarly the nth column of A is defined to 
be the restriction of the function A to the domain of pairs (t, n), where 
t runs through the elements of the set M. We note that the mth row of 
a matrix is a row vector and that the mth column is a column vector. 
With these conventions matrices can be thought of as sets of rows or as 
sets of columns, and addition of matrices is simply addition of corre- 
sponding rows or columns of the matrices involved. Furthermore, the 
k-nth entry in the matrix product of A and B is the product of the kth 
row of A by the nth column of B and is of the form È mem mmm Where m 
is a row vector and f is a column vector. That is, propositions about 
matrix multiplication, when proved entry-by-entry, may sometimes be 
proved by considering only the product of a row vector and a column 
vector. 

Because of the correspondence of row vectors to rows and column 
vectors to columns, we shall agree to call the domain of a row vector or 
a column vector the elements of a single index set. 

Connected with any definition of multiplication are five properties 
which may or may not be valid for the structure being considered. All 
five of the properties do hold for the real numbers, and we state them 
in this context: 


(1) Existence and uniqueness of a multiplicative identity. The real 
number 1 satisfies cl = lc = ¢ for every c. 

(2) Commutativity: ab = ba 

(3) Distributivity: a(b +c) = ab + ac 

(a + b)c = ac + be 

(4) Associativity: a(bc) = (ab)c 

(5) Existence and uniqueness of multiplicative inverses of all 
non-zero elements. 


We can easily settle whether the first two properties hold for matrix 
multiplication. First, the identity matrix I plays the role of the 
multiplicative identity, and the identity is clearly unique. Second, 
commutativity can be expected to fail except in special cases because it 
is not even necessary for the index sets of two matrices to agree properly 
after the order of multiplication has been reversed. 
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The validity of the third property, that of distributivity, is the 
content of the next proposition. 


Proposition 1-2: If A, B, and C are matrices and if AB, AC, and 
AB + AC are well defined, then A(B + C) = AB + AC. Similarly 
(D+ E)F = DF + EF if DF, EF, and DF + EF are all well 
defined. 


Proor: We prove only the first assertion. We may assume that A is 
a row vector m and that B and C are column vectors f and g. Then 


nf + 7ng = > T mdm + X TmGm 


meM meM 

= > (7mm + m9m) 
meM 

= > Tml fm + Im) 
meM 

= a(f + g). 


The fourth and fifth properties are related and nontrivial. Associa- 
tivity does not always hold, but useful sufficient criteria for its validity 
are known. For an example of how associativity may fail, let A be a 
matrix whose index sets are the non-negative integers and whose entries 
are given by 


| ee | 0 0 0 
0 1 —1 0 
0 0 1 -1 0 
A= 
0 0 l —l 
0 0 0 0 1 
Then 
17(A1) = 0, 
whereas 
(17A)1 = 1. 


All the products involved are well defined, but the multiplications do 
not associate. 

We shall not consider the problem of existence of inverses, but 
uniqueness rests upon associativity. For suppose AB = BA = AC = 
CA =I. Since AC = I, we have B(AC) = B, and since BA = I, we 
have (BA)C = C. Therefore, B = C if and only if B(AC) = (BA)C. 
With this note we proceed with some sufficient conditions for 
associativity. 
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Lemma 1-3: Let b; be a sequence of real numbers nondecreasing 
with i and with j. Then lim, lim, b; = lim, lim, },,, both possibly 
infinite. 


Proor: In the extended sense lim, b, = L; exists and so does 
lim, b; = L,*. Now {L,} is nondecreasing, for if L; > L;4,, then for i 
sufficiently large b; > Lisp = 6;,;+,, Which is impossible. Similarly 
{L,*} is nondecreasing, so that lim, L; = Land lim, L;* = L* exist in the 
extended sense. If L + L*, we may assume L* > L and hence L is 
finite. Then there exists an i such that L,* > L. Hence 


L* > L 
2 L; forall j 
> by for all j. 


vV 


Thus b,; is bounded away from its limit on j, a contradiction. 


Following the example of Lemma 1-3, we agree that all limits referred 
to in the future are on the extended real line. 


Proposition 1-4: Non-negative matrices associate under multi- 
plication. 


Proor: Since we are interested in each entry separately of a triple 
product, we may assume that we are to show that (Af) = (7A)f, 
where m > 0, A = 0,f = 0, wis a row vector, f is a column vector, and 
the index sets are subsets of the non-negative integers. Then 


a(Af) = > > nmAmnfn 
and á 


(mA)f = > > TA mad n 


Set b; = m=0 2n=0 nmmn = Lh=0 2m=0 nmmn and apply Lemma 
1-3 to complete the proof. 


If A is an arbitrary matrix we define A* and A- by the equations 
(A*)mn = max {Ann 0} 
(A~)mn = —min {Amn 0}. 


Then A = A*t — A^, At > 0, and A~ = 0. For row and column 
vectors, the matrices 7*, m~, f*, and fT are defined analogously. 
We note that if Af is well defined, then so are Af* and Af~. Powers 
of matrices are defined inductively by A? = J, A” = A(A”~1). The 
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absolute value of a matrix A is |4| = A* + A~. Proposition 1-4 now 
gives us five corollaries. 


Corollary 1-5: Matrices associate if the product of their absolute 
values has all finite entries. 


Proor: We are again to prove that 7(Af) = (7A)f, and we do so by 
setting m = rt — r”, A= At — A`, and f= ft — f, applying 
distributivity, and using Proposition 1-4 on the resulting non-negative 
matrices. 


Corollary 1-6: Finite matrices with finite entries associate. 
Proor: The result follows from Corollary 1-5. 


Corollary 1-7: If A and B are non-negative matrices and f is a column 
vector such that A(Bf) and (A B)f are both well defined, then A (Bf) = 
(AB)f. In particular, if C is a non-negative matrix, if n > 0, and if 
C"f and C(C”-1f) are well defined, then C”f = C(C"~1f). 


Proor: Consider f* and f~ separately and apply Proposition 1-4. 
For the second assertion, set A = C and B = C*~?. 


Similarly one proves two final corollaries. 
Corollary 1-8: If A, B, C, and D are non-negative matrices such that 
either 


(1) ABD, AB, and BD, or 
(2) ACD, AC, and CD 


are finite-valued, then (A(B — C))D = A((B — C)D). 


Corollary 1-9: If A, B, and C are matrices such that either 


(1) A has only finitely many non-zero entries in each row, 
(2) C has only finitely many non-zero entries in each column, or 
(3) B has only finitely many non-zero entries, 


and if (A B)C and A(BC) are well defined, then 
(AB)C = A(BC). 
Some of these conditions are cumbersome to check, but there is a 


simple sufficient condition. Suppose that we write a general product 
as | [?_, (4; — B;), with A; > Oand B, = 0. If all the 2” products 


AAs An BAe cc Am: --, B1B... Be 
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are finite, then we see from Proposition 1-2 and Corollary 1-5 that we 
may freely use distributivity and associativity. 

The effect of matrix multiplication on matrix inequalities is sum- 
marized by the next proposition, whose proof is left to the reader. 


Proposition 1-10: Matrix inequalities of the form A > Bor B < A 
are preserved when both sides of the inequality are multiplied by a non- 
negative matrix. Inequalities of the form A > B or B < A are 
preserved when both sides are multiplied by a positive matrix, provided 
the products have all entries finite. 


Next we consider the problem of “block multiplication” of matrices. 
The picture we have in mind is the following decomposition of the 
matrices involved in a product: 


p H 5) e A 
As Ad\B, BJ \Os O, 
More specifically, let K, M, and N be index sets and let K’, M’, and N’, 
respectively, be non-empty subsets of the index sets. Impose orderings 
on K, M, and N so that the elements of K’, M’, and N’ precede the 


other elements, which comprise the complementary sets K’, ÑM’, and 
N’. Let A, B, and C be matrices such that 


(1) A is defined on K and M, 
(2) B is defined on M and N, and 
(3) AB = C is well defined. 


Let matrices A,, Az, Az, and A, be defined as the restriction of the 
function A to the sets 


(1) K’ and M’ for A, 
(2) K’ and M' for A, 
(3) K’ and M’ for A, 
(4) K’ and M’ for 4,4. 


Pictorially what we are doing is writing A as four submatrices with 
dre p H 
A; A, 
We perform the same kind of decomposition for B and C and obtain 
A, A,\/B, B, Ci C, 
ay a ee 


The proposition to follow asserts that the submatrices of A, B, and C 


1-12 Denumerable matrices 9 


multiply as if they were entries themselves. Its proof depends on the 
fact that matrix multiplication is defined independently of any ordering 
on the index sets. 


Proposition 1-11: A,B, + A,B, = C 
A,B, + A,B, = Cy 
A,B, + A,B; = 03 
A,B, + A,B, = C4. 


Proor: We prove only the first identity since the others are similar. 


(Cily = Cy = > AimBny = > AimBny + > AimBmj 
meM’ 


meM meM’ 


= (A,B,)i; + (A2Bs);- 


Notice that if the submatrix A, has at least one infinite index set, 
then the representation of A by 


Si P “) 
Az; A, 
Ago Agi 
A= Axo Ay 


is not the standard one 


The ordering on the index sets of A is not of the same type as that of 
the non-negative integers. We recall once more, however, that the 
fundamental properties of matrices are independent of any orderings on 
the index sets. It is only the representation of a matrix as an array 
which requires these orderings. 

Limits of matrices play an important part in the study of denumer- 
able Markov chains. We shall touch only briefly at this time on the 
problems involved. 


Definition 1-12: Let {4%} be a sequence of matrices. We say that 
A = lim,... A™ exists if Amn = lim... (A™)n, exists for every m and 
n. 


Notice that limits of matrices are defined entry-by-entry. No 
uniformity of convergence to the limiting matrix is assumed. 

The type of problem that arises is as follows. Let m be a row vector 
and let {f} be a sequence of column vectors converging to a column 
vector f. Is it true that {xf} necessarily converges to nf? The 
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answer to this question is in the negative unless some additional 
hypothesis is added. What is being attempted is an interchange of the 
order of two limit operations—one from the series which defines 
{xf} and the other from the limit as k tends to oo. Such an inter- 
change can be justified only under special circumstances, and we shall 
obtain later in this chapter some sufficient conditions as special cases of 
theorems of measure theory. 


2. Measure theory 


Let X be an arbitrary non-empty set of points and let F be a family 
of subsets of X. We say that F is a field of sets if 


(1) the empty set @ is in F, 

(2) whenever A is a set of F, the complement of A, denoted A, is in 
F, and 

(3) whenever A and B are sets of ¥, so is their union, denoted 
AUB. 


A field of sets F is called a Borel field if it has the additional property 
that whenever A, E F for n = 1, 2, 3,..., so is UZ; Ay. 

The intersection of sets A and B is indicated by A N B, and the 
difference A N B is denoted A — B. From the above definitions the 
reader can easily establish the following result. 


Proposition 1-13: If F is a field of sets, then F contains @ and X 
and is closed under complementation, finite unions, finite intersections, 
and differences. If F is a Borel field, then F is closed under de- 
numerable intersections. 


Proposition 1-14: For any class of sets @ of the points of a set X, 
there exists a unique smallest Borel field containing @. 


Proor: The family of all subsets of X forms a Borel field containing 
@. Form the intersection of all Borel fields which contain @ and call 
the resulting family of sets F. Let A be in F; then A is in all Borel 
fields containing & and so is A. Hence Aisin F. A similar argument 
applies to intersections and denumerable unions. Thus ¥ is the 
smallest Borel field containing @. 


Definition 1-15: A function p from a field of sets F to the extended 
real number system is called a set function. If p(A) > 0 for every A 
in F, p is said to be non-negative. If p(A U B) = p(A) + p(B) when- 
ever A and B are in F and AN B = @, p is said to be additive. 
Suppose A, is in F for n = 1, 2,3,..., and suppose 4; N A; = Ø 


1-17 Measure theory 11 


whenever i # j. If p(U%_1 An) = >2=1 p(A,) holds whenever U7_, Án 
is a set of F, then p is said to be completely additive. In discussing 
set functions, we shall assume that there are no two sets A and B in 
F such that p(A) = +00 and p(B) = —œ, and we shall assume that p 
is not identically infinite. 


An additive set function p has the properties that 


(1) p(s) = 0, 
(2) p(U*¥_, A,) = >%-1 p(A,) for disjoint sets {A,}, and 
(3) p(A U B) + p(4 A B) = p(A) + p(B). 


If p is non-negative and additive and if A is contained in B, then 
p(A) < p(B). To see this, set C = BO Aso that A and C are disjoint 
and AUC = B. Then p(A) + p(C) = p(B) by additivity, and the 
result follows at once. We shall now establish two facts about 
completely additive set functions. 


Proposition 1-16: Let p be an additive set function defined on a field 
of sets F. Let {A,} be a sequence of sets in F such that A, C A, C», 
and suppose A = (JZ, A, isin F. If p is completely additive, then 
limpo plán) = p(A). Conversely, if lim,» p(A,) = p(A) for all 
such sequences, p is completely additive. 


Proor: Set B, = A, and B, = A, A,_,. Then A, = Ut-1 Bk 
disjointly, and by additivity p(A,) = >%-, p(B,). But A = Ure. Bk 
and by complete additivity p(A) = >?_, p(B,). The proof of the 
converse is left to the reader. 


A consequence of this proposition is the following: 


Corollary 1-17: Let p be an additive set function defined on a field 
of sets F in such a way that p(A) < œ for every A. Let {A,} be a 
sequence of sets in F such that A, D Ag D Az D--- and NXr-1 4n = Ø. 
If p is completely additive, then lim,... p(A,) = 0. Conversely, if 
lim,.. p(4,) = 0 for all such sequences, then p is completely additive. 


A non-negative completely additive set function on a field of sets F 
is called a measure. The set of points X with a measure defined on its 
field F is called a measure space. We shall usually denote measures by 
ory. If there is no ambiguity about what measure is involved, we 
shall frequently refer to X by itself as the measure space. 

If X is a measure space with field of sets F and measure p, then X 
is a set in F, and we define p(X) to be the total measure of the space. 
A probability space is a measure space of total measure one. 
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We give four examples of measure spaces. 


(1) Let X be any set, let F = {@, X}, and define (Ø) = 0 and 
p(X) =az 0. Then X is the trivial measure space. 

(2) Let X be Euclidean n-space, let F be the Lebesgue measurable 
sets, and let u be Lebesgue measure (the natural generalization of 
length, area, or volume). 

(3) Let X be the set of six possible outcomes for tossing a die. 
Assign weight 4 to each of the six points in the space, and for any subset 
of X assign as a measure the sum of the weights of the points in the 
set. Then F is the family of all subsets of X, and X is a probability 
space. 

(4) Let X be a denumerable index set, and let 7 be a non-negative 
row vector with X as its index set. Assign as a weight to each point 
of X the value of the corresponding entry of m. For any subset of X 
assign as a measure the sum of the weights of the points in the set. 
Then F is the family of all subsets of X, and X is a measure space 
with total measure 71. 


The sets of a field on which a measure u is defined are called the 
p-measurable or simply the measurable subsets of X. In the construc- 
tion of a measure on a field, it is possible for a non-empty set A to be 
assigned measure zero. In example (2) above, for instance, every 
denumerable set and even certain uncountable sets are sets of measure 
zero. Suppose B is a subset of such a set A. If B is measurable, then 
p(B) = 0 since u is a measure. But 


p(B) < p(A) = 0 


since B C A and A is of measure zero. Thus, a measurable subset of a 
set of measure zero is of measure zero. But there is no reason why such 
a set B has to be measurable. However, one can agree to add all 
subsets of sets of measure zero to a field and extend the resulting family 
of sets to the smallest field containing the family. Such an extended 
field is called an augmented field. It consists precisely of all sets of the 
form (C — D) U E, where C is a set in the original field and D and Æ 
are subsets of a set of measure zero. Therefore the augmented field of 
a Borel field is again a Borel field. Note that in any augmented field 
every subset of a set of measure zero is measurable and has measure 
zero. In later chapters of this book all fieids will be augmented. 

If a statement about the points of a measure space X fails to be true 
only for a set of points which is a subset of a set of measure zero, we 
say that the statement holds for almost all points of X or that it is true 
almost everywhere (abbreviated a.e.). 
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Proposition 1-18: Let u be a measure defined on a field of sets F. 
If {A,} is a sequence of sets in F, if Aisin F, and if A C Un An, then 


H(A) < > (A,). 


Proor: Write B, = A, — (U32: 4,). The sets B, are disjoint in 
pairs, and consequently the sets A N B, are also disjoint. Further- 
more, |J, Ba = Un A, 80 that 


A= An(UA,) 
= A^ (U B.) 
= U (4A B,). 


By hypothesis p is a measure. It is therefore completely additive and 
H(A) = > w(A 2 B,) 
n 
< > p(B,) since AM B, C B, 


< > y(A,) since B, C A, 


n 


To conclude this section we shall establish a result known as the 
Extension Theorem. The proof follows the proof of Rudin [1953]. 


Theorem 1-19: Let F be a field of sets in a space X and let v be a 
measure defined on F. Suppose X can be written as the denumerable 
union of sets in F of finite measure. If Y is the smallest Borel field 
containing F, then v can be extended in one and only one way to a 
measure defined on all of Y which agrees with v on sets of F. 


Before proving the theorem, we need some preliminary lemmas and 
definitions. The property in the statement of the theorem that X is 
the denumerable union of sets of finite measure is summarized by saying 
that v is sigma-finite. 

Let v be a measure defined on a field of sets F in a space X, and 
suppose X = UZ- A, with A, E F and v(4,) < œ. For each subset 
B of X, define u(B) = inf {> »(B,)}, where the infimum is taken over all 
denumerable coverings of B by sets {B,} of F. 


Lemma 1-20: The set function p is non-negative. If A and B are 
subsets of X such that A C B, then (4) < p(B). IfC isa set in F, 
then (C) = v(C). 
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Proor: We see that u is non-negative because u is the limit of non- 
negative quantities. If AC B, then p(A) < p(B) because every 
covering of B is a covering of A. LetC bein F. Then {C} is a cover- 
ing of C and (C) < v(C). And for any covering {C,}, 


W(C) < > o(C,) 


n 


by Proposition 1-18. Therefore, 
v(C) < inf > (C,) = u(0). 


Lemma 1-21: If {A,} is an arbitrary sequence of subsets of X and if 
A = Un An, then (A) < Èn elán). 


Proor: Let e > 0 be given. Let {B ,™} with k = 1, 2,3,... bea 
denumerable covering of A, such that B,™ isin F and $, v(B,™) < 
p(A,) + €/2". This choice is possible by the definition of u. Then 
since all the B’s form a covering of A, we have 


p(4) < 2 2 (B,) 
Ss > (An) +e 


and the assertion follows. 


We define a set theoretic operation @ for subsets of X by 
A@®B=(ANB)uU(BO A). 


The set A @ B is called the symmetric difference of A and B. A point 
isin A @ B if it isin A or B but not both. We leave the details of the 
proof of the next lemma to the reader. 


Lemma 1-22: The subsets of a space X form a ring under the opera- 
tions @ and AN with additive identity @ and multiplicative identity X. 
Every set is its own additive inverse. 


Define a distance d between subsets of X by d(A, B) = (4 @ B). 
We note that d has the properties 


d(A, A) = (Ø) = 0 


and 

d(A, B) = p(A @ B) = d(B, A). 
Since 

AUVUB=(AQGB)VUB, 
we have 


AC(A@B)UB 
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and by Lemmas 1-20 and 1-21 
y(A) < d(A, B) + p(B). 


Replacing A by A@B and B by C@ B, we obtain the triangle 
inequality 
d(A, B) < d(A,C) + d(C, B). 


Lemma 1-23: For any subsets A,, A>, Bı, By, A, and B, of X, 
d((A, U Ag), (Bı U B2)) < d(Ay, By) + d(Ag, B2) 
d((A, A Ag), (Bı A B2)) < d(Ay, By) + d(Ag, Be) 
d(B, A) = d(B, A). 


Proor: We prove only the first and third assertions. First we 
observe that (A, U Az) @ (B: Y B3) C (41 @ Bi) Y (Ap ® B2). For 
suppose x€(A, U Az) ® (B, U B3). We may assume without loss 
of generality that xe A, UA, but x ¢ B, U Bọ If xe Ay, then 
x¢ B, so that xe A, @ B,. Similarly if xe Ay, then xe A, @ B2 
and the containment is established. The first assertion of the lemma 
now follows by applying Lemmas 1-20 and 1-21. For the third part, 
we have 

B@A=(ANB)U(ANB) 
and 

BQ@A =(ANB)U(ANB) 
so that 

B@A=B@A. 


Definition 1-24: Convergence of sets in measure is defined by saying 
that A, — A if lim,... d(A,, A) = 0. Let F* be the collection of all 
subsets A of X for which there exists a sequence {A,} of sets in F 
having the property that A,-> A. Let Y* be the family of denumer- 
able unions of sets in F*. 


Lemma 1-25: If {A,} and {B,} are sequences of sets in F such that 
A,—A and B,— B, then A,UB,> AUB, A,NB,->AOB, 
and Â, —> A. Therefore, F* is a field of sets. For any C,—>C, 
lim,, H(C,) = uC). 

Proor: Since by Lemma 1-23 

d((A, U B,), (A U B)) < d(A,, A) + d(B,, B) 
d((A, N B,), (4 A B)) < d(A,, A) + d(B,, B) 
d(A,, A) = d(A,, A), 
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we have A,U B, > AUB, 4,0 B,—> 4AN B, and Â, — A. The 
limit of u(C,) is established by the inequalities 


w(C,) < a(C,,C) + (C) 
and 
u(C) < d(C,,C) + (Ca). 


Lemma 1-26: p is additive on F *. 


Proor: Let A and B be disjoint sets in F* and pick {A,} and {B,} 
in F such that A, > A and B, —> B. Then since v is additive on F 
and since p agrees with v on sets of F, we have 


H(A, U B,) + w(A,O B,) = w(A,) + »(B,). 
By Lemma 1-25, 


n(A U B) + (A NA B) = (A) + 2B) 
or 
n(A U B) = p(A) + p(B). 


Lemma 1-27: If A = |J, A, with A in Y* and {A,} a sequence of 
disjoint sets in F*, then (4) = >, u(A,). 


Proor: Since A D (A, U Ag U Az U---U Ay), we have, by Lemma 


1-20, p(A) > w(A, U A, U- - -U Ap), and by Lemma 1-26 the right 
side equals >*_, u(A,) for each k. Hence 


p(A) 2 >, plá), 
and equality holds by Lemma 1-21. 
Lemma 1-28: If A is in Y* and if (A) is finite, then A is in ¥*. 
Remark: If A is in ¥*, then (A) is not necessarily finite. 


PrRooF OF LEMMA: Write A = J, A, with {A,} in F*, A, disjoint 
sets, and set B, = jt_, A,. Then 


d(A, B,) = p((A A B,] U [ÂA B,)) 


wo Ur), 
k=n+1 


which by Lemma 1-27 
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Since the last expression on the right is the tail of a convergent series, 
we have B,—> A. Since B,c F*, we can find L, in F such that 
d(L,, Ba) < ifn. Thend(L,, A) < 1/n + d(B,, A), and hence L, — A. 
Thus A is in F*., 


Remark: If A e Y* with p(A) finite, then A is in F*; hence for every 
e > Othereisa Bin F such that u(A ® B) < e. Conversely, if there 
exists such a B, for any e, then B,,,—> A so that Ae F* and, a 
fortiori, Ae G*. These observations give a characterization of the sets 
A in Y* for which u(4) is finite. 


Lemma 1-29: is completely additive on Y*, and Y* is a Borel field. 


Proor: Suppose 
A=UA, 


is the union of disjoint sets {A,}in Y*. Then (A) = p(A,) for every n 
by Lemma 1-20, so that we may assume p(A,) < œ for every n. The 
complete additivity of u now follows from Lemmas 1-28 and 1-27. For 
the proof that Y* is a Borel field, we see clearly that Y* is closed under 
denumerable unions. It remains to be proved that Y* is closed under 
complementation. Since v is sigma-finite, let 


X=U4, 
with A, in F and u(4,) = (A,) < œ. Let B in Y* be given and 
suppose B = |), B, with B, in F*. Since 
A,A B= U (A, NA By) 
k 


and since A, N B, isin F*, A, O Bisin Y*. But by Lemma 1-20, 
(A, N B) < w(A,) 


and we have assumed that u(4„) < œ. Thus by Lemma 1-28, 
A, N Be F*, and since F* is a field, 


A, A(X — (4,9 B)) =4,08 
isin #*. Therefore 
B=XnB=U(4,0 B) 
is in Y*, and the proof is complete. " 


We are now in a position to prove the Extension Theorem. 


PrRooF oF THEOREM 1-19: Existence of the extension of v to a measure 
p defined on Y* is proved by Lemmas 1-20 and 1-29. Since, by 
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Lemma 1-29, Y* is a Borel field containing F, Y* contains Y. The 
extended measure restricted to sets of Y has the desired properties. 
For uniqueness, suppose u’ is another measure on Y that agrees with v 
on F. Since, by sigma-finiteness, X is the union of sets A, in F of 
finite v-measure, we may assume that X is a disjoint union of sets of 
finite measure by letting B, = A, — Uxen Ay. Let C be any set in 
Y; we want to show p»’(C) = (C). By definition, 


P(C) = inf (>, (C) 


with the infimum taken over covers {C,}, where C, isin #. For any 
fixed cover {C,}, we have 


HC) wn) < È (Cn) = 2, (Or). 
Therefore 
p'(C) < inf {> (Cn) = p(C). 
Writing 
HO) = È HCO Ba), 
we see that it is sufficient to show that 


(C0 B,) = WC N B,). 
But 


BC N B,) F p'(Č N B,) = v(B,) a BC ia) Ba) + p(Č N B,). 
Now we know that u’ is dominated by u: 


p'(Č N Ba) < p(Č N Ba) < p(B,) < 0. 
If 


w'(C N B,) < WOO B,), 
we obtain the contradiction 


K'(C A Bp) + ph (CO B,) < (CNA B,) + uO B,). 


3. Measurable functions and Lebesgue integration 


Let F be a Borel field of sets in a set X. The measurable sets of X 
are the sets of F. 


Definition 1-30: Let f be a function with domain X and with range 
the extended real number system. The function f is said to be a 
measurable function if for each real number c the set {x | f(x) < c} is 
measurable. 
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The content of the next proposition is that the property f(x) < c may 
be replaced by any of the conditions f(x) < c, f(x) > c, or f(x) > c. 
Therefore, if f is a measurable function, then the set 


{x |e < f(x) < d} 


is measurable; either or both of the signs < may be replaced by <, 
and the set is still measurable. 


Proposition 1-31: The following four conditions are equivalent: 


(1) {x | f(z) 
(2) {x | f(x) 
(x) 
(x) 


< c} is measurable for every c. 
< c} is measurable for every c. 
> c} is measurable for every c. 
> c} is measurable for every c. 


(3) {x | f(z 
(4) fx | fe 


Proor: From 
{x | f(x) < ch = A fel se ces zb 
{x | fl) > à} = X — {æ | fle) < o), 
elfe z g= Â {else >e- i} 
and 


{x | f(x) < 9 = X — {x | f(x) = o, 


we see that (1) implies (2), that (2) implies (3), that (3) implies (4), and 
that (4) implies (1). 


Proposition 1-32: Every constant function is measurable. 


Proor: If f(x) = a identically, then {x e X | f(x) < c} is either ø 
or X. 


In analogy with our procedure for matrices in Section 1, we define 
f* and f` by 
f* (x) = max {f (x), 0} 


f=) = —min {f(z), 0}. 


Proposition 1-33: If f is measurable, then so are f+. f~, and |f]. 


co} = {x | f= cs U{x|0> o 
ch} = {x | fs —c} U {| 0 = ch. 


PROOF: fe ft 
Isf- 


IV 


IV 
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The set {x | 0 > c} is either @ or X and is therefore measurable. For 
|f| we have 


{x | |f| < 9 = {| -ce < f(x) < e}. 


Proposition 1-34: Let f and g be measurable functions whose values 
are finite at all points. Then f + g and f-g are measurable. 


Proor: We prove only the assertion about f +g. Order the 
rational numbers and call the nth one r,. Then 


fe |(f + ga) > = U (el fe) > e+ nr} 0 (| gle) > =r), 


n=1 


so that f + g is measurable. 
Corollary 1-35: If f is measurable, then so is cf for every constant c. 


Proposition 1-36: Let {f,} be a sequence of measurable functions. 
Then the functions 


(1) sup fala) 
(2) inf falx) 
(3) lim sup f,(x) 
(4) lim inf fala) 


are all measurable. 


Proor: The assertions follow from the observations that 
{e | sup fale) > } = U fe | fale) > o), 


fe | inf f(z) < = U | file) < o, 


lim sup f (x£) = inf sup falx) 
n n m>n 
and 
lim inf f,(z) = sup inf f,,(z). 
n n m>n 
The supremum of finitely many functions is their pointwise maxi- 


mum. Therefore the maximum and minimum of finitely many 
measurable functions are both measurable. 


Corollary 1-37: If {f,} is a sequence of measurable functions and if 
f = lim, fa exists at all points, then f is measurable. 


1-38 Measurable functions and Lebesgue integration 21 


We shall give three examples of measurable functions. 


(1) Let F be the family of sets on the real line which are either finite 
unions of open and closed intervals or complements of such sets. 
Then F is a field of sets. Let Y be the smallest Borel field containing 
F. All continuous real-valued functions are measurable with respect 
to the Borel field Y. 

(2) Let X be a space for which Z is the family of all subsets of X. 
Then every function f defined on X is measurable. 

(3) Let X be the union of a sequence of disjoint sets {A,}, and let Z 
be the family of all sets which are unions of sets in the sequence. 
Then a function f is measurable if and only if its restriction to the 
domain A, is a constant function for each n. In particular, if Z = 
{X, ø}, then the measurable functions are the constant functions. 


Let A be any subset of X. The characteristic function of A, denoted 


x4(x), is defined by 
l ifzveA 
(x) = 
ms 0 otherwise. 


A function that takes on only a finite number of values is called a 
simple function. It may be represented, uniquely, in the form 


(*) s= > CnXan 


where the c, are the distinct values the function takes on and the sets 
A, are disjoint. The simple function is measurable if and only if all 
of the sets A,, Áz, . . ., Ay are measurable. 


Proposition 1-38: For any non-negative function f defined on X, 
there exists a sequence of non-negative simple functions {s,} with the 
property that for each xe X, {s,(x)} is a monotonically increasing 
sequence converging to f(x). If f is measurable, the {s,} may be taken 
to be measurable. 


Proor: For every n and for 1 < j < n2” set 


dye (CES asd) 


= {x | f(x) = n}. 


and 


Then 
n2” Per 


Sr 2 Qn XAny + nxs, 
j= 
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increases monotonically with n to f. If f is measurable, then so are 
{An} and B,; thus s, is measurable. 


If p is a measure defined on a Borel field Z of subsets of X, we denote 
the measure space by the ordered triple (X, Z, p). In (X, Z, p) let E 
be a set of the family @, and suppose s is a non-negative, measurable 
simple function, represented as in (*) above. Since s is measurable, 
A,, is measurable and (A, N E) is defined for every n. Set 


N 
I,(s) = 2 c,uļ(á, N E). 


n= 
For any non-negative measurable function f, define the Lebesgue 
integral of f on the set Æ with respect to the measure u by 


f fap = sup Ig(s), 


where the supremum is taken over all simple functions s satisfying 
O<s</f. We note that the value of the integral is non-negative 
and possibly infinite. It can be verified that if s is a non-negative 
measurable simple function, then 


Í sdu = I,(s). 
E 


If f is an arbitrary measurable function, then by Proposition 1-33, 
if z J ‘dp and f J du are both defined. If the integrals of f* and f- 
are not both infinite, we define the integral of f by Í g fdu = fe fdu — 
fe f du. The function f is said to be integrable on the set E if 
fe f*du and f, fdp are both finite. 

Following our examples of measure spaces and measurable functions, 


we give three examples of integration. A fourth example will arise in 
Chapter 2. 


(1) Let Z = {@, X} and suppose p(X) = 1. Only the constant 
functions are measurable and 


f oau = 0; f edu =. 
Ø X 


(2) Let X be the real line, let Y be the Borel field of sets constructed 
in the first example of measurable functions, and let u be Lebesgue 
measure. Continuous functions are measurable, and it can be shown 
that the value of the Lebesgue integral of a continuous function on a 
closed interval agrees with the value of the Riemann integral. More 
generally one finds that every Riemann integrable function is Lebesgue 
integrable, but not conversely. 
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(3) Let X be the denumerable set of points described in Example 4 
of measure spaces, and let 7 be a non-negative row vector defined on X. 
Then ~ defines a measure on X. If f is an arbitrary column vector 
defined on X, then f is a function on the points of X. Furthermore, f 
is measurable since all subsets of X are measurable sets. The reader 
should verify that the integral of f over the whole space X with respect 
to the measure 7 is the matrix product nf and that the condition for the 
integral to be defined is precisely the condition for the matrix product 
to be well defined. Because of this application of Lebesgue integra- 
tion, we often speak of column vectors as functions and non-negative 
row vectors as measures. We shall return to this example in Section 5 
of this chapter. The proof of the next proposition is left to the reader. 


Proposition 1-39: The Lebesgue integral satisfies these seven 
properties: 


(1) Ifc is a constant function, 


Í cdu = cp(E). 


(2) If f and g are measurable functions whose integrals are defined 
on £ and if f(x) < g(x) for all x e E, then 


ip ye Í g 


(3) If fis integrable on £ and if c is a real number, then cf is integrable 
and fy ofdy = c fp fdp. 

(4) If f is measurable and (E) = 0, then fa fdp = 0. 

(5) If E' and E are measurable sets with E’ C E and if fis a function 
for which Í p fdu is defined, then Í p fdu is defined. In particular, 


I a [tra 


(6) If |f(x)| < c for all x e E, if u(Z) < oo, and if f is measurable, 
then f is integrable on Æ. 

(7) If f is measurable on Æ and if |f| < g for a function g integrable 
on Æ, then fis integrable on £. 


and 


Corollary 1-40: If f is a non-negative measurable function with 
fe fdp = 0, then f = 0 a.e. on E. 
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Proor: The subset of E where f(x) = 1/n must have measure zero 
since otherwise f would have positive integral by (1) and (2) of Prop- 
osition 1-39. The set where f # 0 on Æ is the countable union on n of 
these sets. 


4. Integration theorems 


We shall make frequent use of four important facts about the 
Lebesgue integral. We develop these results as the four theorems of 
this section. 


Theorem 1-41: Let f be a fixed measurable function and suppose that 
J x fdp is defined. Then the set function p(E) = fe Jdu is completely 
additive. 


Proor: If we can prove the theorem for non-negative functions, we 
can write f = ft — f~ and apply our result separately to f+ and f~. 
We therefore assume that f is non-negative. We must show that if 
E = Un-1 E, disjointly, then p(Z) = >_, p(H,). Iff is a character- 
istic function y,, then p(#) = fe xadu = p(A N E) and the complete 
additivity of p is a consequence of the complete additivity of u. If 
f is a simple function, the complete additivity of p is a consequence of 
the result for characteristic functions and of the fact that the limit of a 
sum is the sum of the limits. Thus, for general f we have for every 
simple function s satisfying 0 < s < f, 


f sdu = > Í sdu < > p(E,) 
E n=1 VEn n=1 


by property (2) of Proposition 1-39. Hence 


p(E) = sup Í sdu < S p(E,). 
s n=1 


We now prove the inequality in the other direction. By property (5) 
of Proposition 1-39, p(E) > p(#,) for every n since f = f+. Thus if 
p(En) = +œ for any n, the desired result is proved. We therefore 
assume p(E,) < oo for every n. Let e > 0 be given and choose a 
measurable simple function s satisfying 0 < s < f, 


f sia = f fap -$ 
E, Ey 


€ 


i sdu = f fdp — 5 
Ez E2 


and 
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This choice is possible by the definition of the integral as a limit. Then 


p(E U Ez) = Í fdp = | sau = f sdu +f sdu 
E, Ez 


Í. fau + Í. alae 
p(#,) + p(E2) — «. 


IV 


Hence 
p(E, U E) = p(#;) + p(#2). 
By induction, we obtain 
pH, U Ez UU En) = p(By) +--+ + pE) 
and 
p(E) = p(£,) +---+ p(#,) for every n. 
Hence 


pH) = >. (En). 


n=1 


The proofs of two corollaries of Theorem 1-41 are left as exercises. 
These results use only the additivity of the integral and not the 
complete additivity. 


Corollary 1-42: If f is measurable, if iE fdu is defined, and if 
(E ® F) = 0, 
then fo fdp is defined and E fdp = fe fdp. 


Corollary 1-43: If fe fdu is defined, then 


|| Jau < f, Mide 


If f is integrable on Æ, then so is |f|. 


Let f and g be two functions whose integrals on Æ are defined. 
Suppose the set A = {x e E | f(x) # g(x)} is of measure zero; that is, 
suppose f and g are equal almost everywhere on E. Writing E = 
AU (EQ A), we find by applying Corollary 1-42 twice, 


[tae = | fu = [ote = J ody. 


ENA EOA 
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Functions which differ on a set of measure zero thus have equal 
integrals. Therefore, when we are thinking of a function f defined on X 
in terms of integration, it is sufficient that f be defined at almost all 
points of X. And, if we agree to augment Borel fields of sets by adjoin- 
ing subsets of sets of measure zero, we see that if f and g differ on a 
set of measure zero and if f is measurable, then g is measurable. With 
the convention of augmenting the field, we obtain from Corollary 1-37, 
for example, the result that if {f,} is a sequence of measurable functions 
such that 


f(a) = lim falz) 


almost everywhere, then f is measurable. Similarly the theorems to 
follow would be valid with convergence almost everywhere if the 
underlying Borel field were augmented; the necessary modifications in 
the proofs are easy. 

We now state and prove the Monotone Convergence Theorem, which 
is due to B. Levi. 


Theorem 1-44: Let E be a measurable set, and suppose {f,} is a 
sequence of measurable functions such that 


Ofis fhas 


f(x) 


and 


lim falx). 
n= 
Then 


f fdp = lim Sade. 
E n+ oo E 
ProorF: Since the {f,} increase with n, so do the {fe fdu}. Therefore 
k = lim | fdu 
n= E 


exists. Since fis non-negative and is the limit of measurable functions, 
we know that Í g fdp exists, and since f, < f, we have 


[tae f jay 


for every n. Therefore, k < Í g fdu. Let c be a real number satisfying 
0<c¢cx< l, and let s be a measurable simple function such that 
O< 3< f. Set 


E, = {x € E | falx) > cs(x)} 
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so that E, C E, C E C---. Then E = UZ- #,. For any n we 


have 
c Í sdp. 
En 


Since the integral is a completely additive set function (Theorem 1-41), 
we have by Proposition 1-16 


lim ef sdu = ef sdp. 
n>o JEn E 


beef sdu. 
E 


Letting c — 1, we have k > Í g Sdp, and taking the supremum over all 
s, we find k = fe fàn. 


IV 


ke | fay = f fad 
E En 


Thus, as n — œ, 


Proposition 1-45: Suppose h = f + g with f and g integrable on Æ. 
Then h is integrable on E and 


[rdw = | fan + f ody. 
E E E 


Proor: We first prove the assertion for f and g non-negative. For 
simple functions the assertion is trivial. If fand g are not both simple, 
apply Proposition 1-38 to find monotone sequences of non-negative 
measurable simple functions {t,} and {u,} converging to f and g. Then 
{sn = tn + u,} converges to h, and since 


I Sadu = f tdu + f udp, 
E E E 


the result follows from Theorem 1-44. Next, if f > 0, g < 0, and 
h=f+g = 0, we have f = h + (—g) with h > 0 and (—g) = 0, so 


that 
Í fdp = Í hdp + |: (—g)du 


[se J fyete 


Since the right side is finite and since h > 0, h is integrable. For an 
arbitrary h > 0, decompose Æ into the disjoint union of three sets, one 
where f > 0 and g > 0, one where f > 0 and g < 0, and one where 


or 
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f < Oandg = 0. Theorem 1-41 then gives the desired result. Finally, 
for general h, write h = ht — h- and consider ht and h` separately. 


Corollary 1-46: Let E be a measurable set, and suppose {f,} is a 
sequence of non-negative measurable functions with 


fl) > fa). 


Proor: The result follows from Proposition 1-45 and Theorem 1-44. 


Then 


Proposition 1-47: If f is a non-negative integrable function, then for 
every e > 0 there is a ô > 0 such that, if (E) < 5, then 


i. fdu < e. 


Proor: Set fa = min (f, n). By Theorem 1-44, 


im f fde = f fu 


Since f is integrable, we may find an N such that 
| F- ide < § 
X 
by Proposition 1-45. Let ê= e/(2N). Ifp(E) < 4, then 
f fdu = Í (f — fy)du + Í Jfndu by Proposition 1-45 
E E E 
< Í (f — fy)du + Nu(#) by Proposition 1-39 
x 
<€. 
Our third theorem for this section is known as Fatou’s Theorem. 


Theorem 1-48: Let E be a measurable set, and let {f,} be a sequence 
of non-negative measurable functions. Then 


| lim inf fidu < lim inf I f dp. 
E n n E 
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In particular, if f(z) = lim, f,(x), 
f fdu < lim inf Í Fa 
E n E 


Proor: Set g (x) = inf,., f,(z). Then 0 < g,(x) < g(x) <---, and 
gn is measurable on E. We have g,(x) < f,(x) and 


lim g(x) = lim inf f,(z). 


By Theorem 1-44, 
Í lim inf f,du = f lim g,dp = lim f gadu. 
E n En n E 
The result now follows from the inequality f ge Indu < Í r nde. 


The fourth basic integration theorem is the Lebesgue Dominated 
Convergence Theorem. 


Theorem 1-49: Let E be a measurable set, and suppose {f,} is a 
sequence of measurable functions such that for some integrable g, 
\fnl < g forall n. If f(x) = lim, f,(x), then lim, fẹ fadu exists and 


Í fy = lim Í äu 


Proor: By property (7) of Proposition 1-39, f is integrable and so is 
f, for every n. Apply Fatou’s Theorem first to fa + g = 0 to obtain 


fe (f + gdu < lim inf fẹ (fa + g)du or 
f fie < lim inf f Fp 
E n E 
Apply the theorem once more to g — f, = 0 to obtain 


[ to = Pde < lim inf f (9 - Saddu 
E n E 
or 

-Í fay < lim int f (- fdu 
és E E 


Í Jdu = lim sup i Fide. 
E n E 


Therefore lim, ii g Jadu exists and has the value asserted. 
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Corollary 1-50: Let Æ be a set of finite measure and suppose {f,} is a 
sequence of measurable functions such that |f,| < c for all n. If 


f(x) = lim, falx), then fe fdp = lim, fe frdp. 


Much of the discussion of this section has dealt with the following 
problem: A sequence of integrable functions f, converges a.e. to a func- 
tion f; we want to be able to conclude that f Jadu tends to f fdp. First 
we should note that at almost all points fat > ft and fa% > f~, and 
hence it is sufficient to check the convergence of the integral separately 
for these two sequences. Thus we may consider the case f, > 0 alone. 
For non-negative functions Fatou’s inequality is the only general 
result; one cannot conclude equality without a further hypothesis. 
Two sufficient conditions are given by monotone and dominated 
convergence. 

But if we restrict our attention to a space of finite total measure, then 
we can give a necessary and sufficient condition. 


Definition 1-51: A sequence {f,} of non-negative integrable functions 
is said to be uniformly integrable if for each e > 0 there is a number k 
such that 


Fide < € 
{T|fn(x) > k} 


holds for every n. 


Equivalently we may require that the inequality holds for all suffi- 
ciently large n. For suppose it holds for n > N and for the number c. 
Then since each f, is integrable, there is a k, depending on n (and, of 
course, €) such that 


tnd < € 
{X|fn(X)> kn} 


and we may choose k = sup {k,, ko, ..., ky, c}. 


Proposition 1-52: If {f,} is a sequence of non-negative integrable 
functions tending to f and if (E) < oo, then fe fad —> fe fdp if and 
only if the {f,} are uniformly integrable. 


Remark: The sequence {f,} need converge to f only almost every- 
where for the proposition to be valid, provided f is assumed measurable. 
This measurability condition is always satisfied if the underlying Borel 
field is augmented. 
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Proor: We shall use the notation f for the function truncated at k; 
that is, f(x) = inf (f(x), k). Let 


Ay = | flu- f fade 


CF = f f®dp — | dy 
E JE 


pru Í fanz | fPdu = | (f, — Bde. 
E 
{xlfn >k} 


We have 
A, = B: + 0, — D,*. 


Clearly f® increases monotonically to f, so that B* tends to 0 by 
monotone convergence. Since f, > f, f, > fP. But f,™® is bounded 
by k. Thus, on the totally finite measure space E, lim, C,*" = 0 by 
Corollary 1-50. Hence by choosing a large k and then a sufficiently 
large n (depending on k), we can make the first two terms on the right 
side as small as desired. If the functions are uniformly integrable, 
then we can find a large k (perhaps larger than the one already chosen) 
for which D,” will be small for alln. Hence, for all sufficiently large n, 
the left side is small. Thus the integrals converge. 

Conversely, suppose that A„,—> 0. Then there is a k such that for 
all sufficiently large n, D,* < ¢/2. Thus for all n sufficiently large, 
we have 


2> f (fy Bide 


{Xifn >k} 


IV 


(fn — dp 
{X|fn > 2k} 
1 
3 f frdp; 


{xlfn > 2k} 


IV 


since f, — k > 3f, on the set in the last two integrals. Taking 2k as 
the number in the equivalent definition, we see that we have uniform 
integrability. 


5. Limit theorems for matrices 


We have already said that if 7 is a row vector and if {f} is a sequence 
of column vectors converging to f, then it is not necessarily true that 
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nf converges to mf. Our object in this section is to obtain sufficient 
criteria to justify saying that af = lim,... af. 

In the examples of Lebesgue integration, we noted that non-negative 
row vectors are measures and column vectors are functions. Functions 
are integrated by forming the matrix product of the measure and the 
function. Thus, the theorems of Section 4 immediately give us the 
following four results. In each of them it should be borne in mind 
that: 


(1) There is a corresponding result if row vectors are thought of as 
functions and column vectors are thought of as measures. 

(2) These results imply corresponding results about matrices which 
are not just row or column vectors. (Recall the discussion in 
Section 1 about proving matrix equalities entry-by-entry.) 


Proposition 1-53: Let m > 0 be a row vector and suppose {f} is a 
sequence of column vectors converging entry-by-entry to f and 
satisfying 

os FP SJ? Sisk 


Then af = lim, nf”. 


Proor: This result is a restatement of the Monotone Convergence 
Theorem as it applies to matrices. 


Corollary 1-54: Let m > 0 be a row vector and suppose {f} is a 
sequence of non-negative column vectors such that 


= (Ke) 
f= 25 
Then 
auf = > uf w, 
k=1 
Proor: This corollary is immediate from Corollary 1-46. 


Proposition 1-55: Let 7 > 0 be a row vector and suppose {f} is a 
sequence of non-negative column vectors. Then 


(lim inf f) < lim inf (xf). 
k k 
Iff = lim, f® exists, then 7f < lim inf, (xf). 


Proor: This is Fatou’s Theorem. 
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Proposition 1-56: Let m > 0 be a row vector such that 71 is finite. 
If {f} is a sequence of column vectors such that |f| < c1 and 
f = lim, f™ exists, then 


af = lim of. 
k 


Proor: The result follows from Corollary 1-50. 


A harder problem arises with a sequence of non-negative row vectors 
a converging to a row vector m. It is not sufficient for 71 < M 
and |f| < c1 in order for nf = lim, rf. For set 


mY =(1 000 ...) 
m=(0 10 0 ...) 
7) = (0 0 1 0...) 


and take f= 1. Then ~r = 0 so that af = 0, while lim, cf = 1. 
We give two sufficient conditions for 


af = lim rf; 


our integration theorems do not provide us with quick proofs, however. 
The second proposition is closely related to the Silverman-Toeplitz 
Theorem on summability methods. 


Proposition 1-57: If {7} is a sequence of non-negative row vectors 
converging to 7, if f is a column vector such that 0 < f < c1 for some 
c, and if 71 = lim, 71 with 71 finite, then 


af = lim af, 


Proor: Take f as a measure and {7} as a sequence of non-negative 
functions and apply Fatou’s Theorem. We have 
af < lim inf xf. 


With c1 — f as a measure and {7} as a sequence of functions, Fatou’s 
Theorem gives 
a(c1 — f) < lim inf 7 (cl — f). 


Since 71 is finite and lim 71 = 71, we find 


—af < lim inf (—7™f) 
or 
af = lim sup rf. 
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Proposition 1-58: Let {7} be a sequence of row vectors converging 
to 7 and satisfying |7|1 < M. Suppose f is a column vector with 
the property that for any ô > 0 only finitely many entries of f have 
absolute value greater than 6. Then 


nf = lim 7f. 


Proor: The entries of f are clearly bounded, say by c. Numbering 
the entries, we have for every N 


N 
lf = af] È ey = PAA +S del + PDI 
= j>N 


Let « > 0 be given. Choose N sufficiently large that |f;| < «/4M for 
j > N. Choose k sufficiently large that |m; — n| < e/2cN for 
j < N. Then |xf — f| < e, and the result is established. 


As we noted in Section 1, results about general denumerable matrices 
can be reduced to results about row and column vectors. In particular, 
the propositions of the present section apply equally well to sequences 
of the forms {Af} and {7A}. 


6. Some general theorems from analysis 


In this section we collect a variety of results from analysis which we 
shall need in later chapters. We prove only some of them. At first 
reading the reader may find it wise to skip this section, returning to it 
later as the material is required. 


a. Stirling’s formula. Stirling’s formula gives an asymptotic 
expression for m! The approximation is 


m” j—— 
m! ~ ar V 2am, 
e 


where the symbol ~ indicates that the ratio of the two quantities tends 
to one as m increases. For a proof, see Courant and Hilbert [1953], 
pp. 522-524. Stirling’s formula immediately gives an approximation 
for the binomial coefficient ("") for large n. The coefficient (5) is 
defined a 

enned as bla — b) 


Lemma 1-59: Forr > 1, 


(7) ~ se (eas) 


where c is a constant depending on r but not on n. 
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) is defined to be 


! 
The multinomial coefficient ( ble... di = val 


a 
b,c,...,4 
b. Difference equations. An nth order linear difference equation 
with constant coefficients is an expression of the form 
Yren + Cn-1Yr+n-1 t't + C1YK41 + CoYk = Te 


where y, and rẹ are functions defined on the integers and where the 
Cn-1»- - -, Co are complex numbers. The equation is homogeneous if 
r, =0 and nonhomogeneous otherwise. For a nonhomogeneous 
solution, we refer to any single function y, satisfying the equation as a 
particular solution, and we call the set of functions satisfying the same 
equation with 7, = 0 the homogeneous solution. The totality of solu- 
tions to any difference equation is known as the general solution. 


Proposition 1-60: Every solution of the difference equation 


Yon + Cn-1Yr+n-1 $7 + Coy, = 0 


is a linear combination of n fixed functions, obtained as follows: If a 
is a root of multiplicity q of the characteristic equation 


x" + C,_ yu" +- t 65% + Co = O, 
then g of the functions are 
af, kat, k7a®,..., kalak. 


Conversely, each of these functions is a solution of the difference 
equation. Furthermore, each solution of the equation 


Yr+n + Cn-1Yr+n-1 t+ + CoYk = fk 


is the sum of a fixed particular solution and some solution of 


Yrken + Cn-1Yr+n-1 H't + CoYr = 9. 


Conversely, every such sum is a solution of the nonhomogeneous 
equation. 


For a proof of this proposition, see Hildebrand [1956], pp. 202-203. 


c. Cesaro summability and Abel summability. Let {a,} be a sequence 
of real numbers. Define b, to be the arithmetic mean of the first n 
terms of the sequence {a,}. The sequence {b,} is called the sequence of 
Cesaro averages of the sequence {a,}. The sequence {a,} is said to be 
Cesaro summable if its sequence, {b,}, of Cesaro averages has a limit. 
If A™ is a sequence of matrices, the sequence of Cesaro averages B™ 
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is defined entry-by-entry: Bf? is the Cesaro average of AP, A?,..., 
A{p. The basic fact about Cesaro summability is the following 
proposition. 


Proposition 1-61: If a sequence {a,} converges to a limit L, then the 
sequence of Cesaro averages {b,} converges and its limit is L. 


Proor: Let f be the column vector whose nth entry is a, — L, and 
let 7” be the row vector defined by 


i” = Ifn for l<j<en 
j 0 forn>j. 
Then 
b, = nrf + L. 


Now |r™|1 = 1, lim, 7™® = 0, and f is a column vector with entries 
tending to 0. Hence by Proposition 1-58, lim, 7‘“f = 0. Therefore 
lim, 6, = L. 


The converse of Proposition 1-61 is false. The sequence {a,} defined 
by dən = 0, Gan41 = 1 does not converge, but it is Cesaro summable. 
Let {c,} be a sequence of real numbers. (In most applications the 
partial sums cy + ---+ C, are assumed bounded.) If the limit 
0 
lim Cnt” 
ttl 2 i 
exists, the limit is called the Abel sum of the series > c, and the series 
is said to be Abel summable. Abel’s Theorem is the following result. 


Proposition 1-62: If the series > c, converges to a finite limit L, then 
it is Abel summable and its Abel sum is L. 


Proor: Since the partial sums converge to a finite limit, the c, are 
bounded and >'c,¢" converges absolutely for t < 1. Let {é,} be any 
sequence of positive reals less than one and increasing to one. Let f 
be the column vector whose nth entry is (cp +---+ c,) — L, and let 
a be the row vector defined by 


ai) = (l = tty? 


nf = > oh” — L. 


Now |7|1 = 1, lim, 7 = 0, and f is a column vector with entries 
tending to 0. Hence by Proposition 1-58, lim xf = 0. That is, 


lim > cty" = L. 


k> œ 


Then 
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This equality for every such sequence {tą} implies that 


lim > c,f" = L. 
ttl 2 h 


d. Convergent subsequences of matrices. A bounded sequence of 
real numbers has a subsequence which converges to a finite limit. We 
shall obtain a generalization of this result to matrices. 


Proposition 1-63: Let {A™} be a sequence of matrices with the 
property that for some pair of real numbers c and d, cH < A™ < dE 
for all n. Then there exists a subsequence of matrices {A} which 
converges in every entry. 


Proor: Since there are only denumerably many entries in each 
matrix, they can be numbered by a subset of the positive integers. 
Select a subsequence {A{”} which converges in the first entry. Let 


AS, AP, AY,... be a subsequence of A, A®,... which converges 
in the second entry. In general, let A%™, A&™t+?,... be a subsequence 
of A™,, A™+P,... which converges in the mth entry. Finally set 


AM) = A™, 


Then {A} converges in every entry. 


Corollary 1-64: Let {A\} be a sequence of matrices with the property 
that cE < A™ < dE foralln. Then lim, AM = A exists if and only 
if lim, A“ = A for every convergent subsequence {A}. 


Proor: The necessity of the condition is trivial. For the sufficiency 
suppose lim A‘ does not exist. Then lim inf Af? < lim sup AP. 
Pick a subsequence of {LZ} that converges in the 7i-jth entry to 
lim sup A{®, and do the same for lim inf A{?. Apply Proposition 1-63 
to extract subsequences convergent in all entries from these sequences, 
and the result follows at once. 


e. Sets of positive integers closed under addition. The greatest 
common divisor of a non-empty set of positive integers is the largest 
integer which divides all of them. If the set consists of {n,, no, ...}, 
its greatest common divisor is denoted (n4, no, ...). 


Lemma 1-65: If T is a set of positive integers with greatest common 
divisor d, then there exists a finite subset of T for which d is the 
greatest common divisor. 
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Proor: Let n, = k,d be an element of T. If k, = 1, {n,} is the 
required set. If not, choose nz such that n; łn. Then (n,, na) = 
kad with ka < ky. If ka = 1, {n,, ng} is the required set. Otherwise, 
find ng such that k dł na, and set (ni, no, ng) = kgd with kg < kp. 
Continuing in this way, we obtain a decreasing sequence of integers 
kı, ko, . .. bounded below by 1. It must terminate, and then we have 
constructed the finite set. 


Lemma 1-66: Let T be a non-empty set of positive integers which is 
closed under addition and which has the greatest common divisor d. 
Then all sufficiently large multiples of d are in the set T. 


Proor: If d + 1, divide all the elements in T by d and reduce the 
problem to the case d = 1. By Lemma 1-65 there is a finite subset 
{n,,...,”,} of T with greatest common divisor 1. Then there exist 
integers c4, . . ., € with the property 


Cyn, +--+ on, = l. 


If we collect the positive terms and the negative terms and note that T 
is closed under addition, we find that T contains non-negative integers 
mand n with m — n = 1. Supposeg = n(n — 1). Theng = an + b 
witha > n — land 0 < b <n — 1. Thus 


q = (a — bjn + bm 
and q is in T. 


f. Renewal theorem. The Renewal Theorem, one of the important 
results in the elementary theory of probability, can be stated purely 
in terms of analysis. 


Theorem 1-67: Let {f,} be a sequence of non-negative real numbers 
such that > f, = 1 and fọ = 0, and suppose the greatest common 
divisor of those indices k for which fẹ > Ois 1. Setu = > nfa, uo = 1, 
and u, = >f2) Ufa, If is infinite, then lim, u, = 0, and if p is 
finite, then lim, u, = l/u. 


For a proof, see Feller [1957], pp. 306-307. 


g. Central Limit Theorem. Identically distributed independent 
random variables are defined in Sections 3-2 and 6-4. The mean of a 
random variable is its integral over its domain, and its variance is the 
integral of its square minus the square of its integral, a quantity which 
is always non-negative. 
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Theorem 1-68: Let {y,} be a sequence of identically distributed 
independent random variables with common mean p and with common 
finite variance o? > 0. Set Sm = Yı +--+ Ym Then for all real « 
and f with « < B, 


. a Sm — Mp = ae a 
lim Pr < e <] (p) — D(a), 


where 


D(x) = — | edu. 


For a proof, see Doob [1953], p. 140. 


CHAPTER 2 


STOCHASTIC PROCESSES 


1. Sequence spaces 


We shall introduce the concept of a stochastic process in this chapter 
and develop the basic tools needed to treat the processes. Before the 
formal development, we shall indicate the intuitive ideas underlying 
the formal definitions. 

We imagine that a sequence of experiments is performed. The 
outcomes may be arbitrary elements of a specified set, such as the set 
{“yes,” “no,” “no opinion”}, the set {heads, tails}, the set {fair, 
cloudy, rainy, snowy}, or a set of numbers. The experiments may be 
quite general in nature, but we impose some natural restrictions: 


(1) The set of possible outcomes is denumerable. (This restriction 
is natural for the present book. It would be removed in a more 
general treatment of stochastic processes.) 

(2) The probability of an outcome for the nth experiment is com- 
pletely determined by a knowledge of the outcomes of earlier experi- 
ments. Here “probability” is used heuristically, to motivate the 
later precise definition. 

(3) The experimenter is, at each stage, aware of the outcomes of 
earlier experiments. 


We shall first consider a sequence of n experiments, where n is 
specified at the beginning. Later we shall consider a denumerably 
infinite sequence of experiments. In each case we assume that the 
experiments do not stop earlier. However, this is an unimportant 
restriction; a process that terminates may be represented in our 
framework by allowing the outcome “stopped.” The following are 
examples of such sequences of experiments: 


(1) A sociologist wishes to find out whether people feel that television 
is turning us into a nation of illiterates. He asks a carefully selected 


sample of subjects and receives the answer “yes, no,” or “no 


opinion ” in each case. 
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(2) A gambler flips a coin repeatedly, recording “heads” or “tails.” 

(3) A meteorologist records the daily weather for 23 years, classifying 
each day as “fair,” “cloudy,” “rainy,” or “snowy.” 

(4) A physicist tries to determine the speed of light by making a 
series of measurements. (Since each measurement is recorded only to 
a certain number of decimal places, the possible outcomes are rational 
numbers, and hence the set is denumerable.) 

(5) A physicist makes a count of the number of radioactive particles 
given off by an ounce of uranium. A measurement is made every 
second, and the outcome of the nth measurement is the total number of 
particles given off until then. 


The exact way in which probabilities are determined from an experi- 
ment is a deep problem in the philosophy of science, and it will not 
concern us here. We will assume that the nature of the experiment 
yields us certain probabilities, namely the probability that the nth 
experiment results in an outcome c, given that the previous experiments 
resulted in outcomes Co, €,,...,;¢,-1- We then design a probability 
space in which one can compute the probability of a wide variety of 
statements concerning the experiments and in which the specified 
(conditional) probabilities turn out as given. 

The elements of our probability space Q are sequences of possible 
outcomes for the experiments (either sequences of length n, for a finite 
series of experiments, or infinite sequences). The elements of the Borel 
field Z of measurable sets will be the truth sets of statements to which 
probabilities are to be assigned. (The truth set of a statement about 
the experiments is the set of all those sequences in 2 for which the 
statement is true.) A measure p is constructed, and the probability 
of a statement is the measure of its truth set. In particular, (Q) = 1 
in a probability space, and hence the probability of a logically true 
statement is 1. 

Let us first consider the case where n experiments are performed. 
The possible outcomes are conveniently represented by a tree, with 
each path through the tree representing a sequence of possible out- 
comes. In the diagram n equals 3, Q has 8 elements, and # consists 
of all subsets of Q. 

The numbers on the branches, known as branch weights, represent 
the conditional probabilities mentioned above. For example, p, is the 
probability of heads, given that the first toss came up heads, while 
1 — pz is the probability that tails is the outcome that follows two 
heads. The weight assigned to the path HHT is taken to be the 
product of the branch weights, p,p.(1 — ps). The measure p(A) of a 
set of paths A is the sum of the weights of the paths in A. In the usual 
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setup for coin tossing, each p is 4, and the weights of the paths are 
each 4. 

The branch weights may be arbitrary non-negative numbers, but the 
sum of the weights of branches starting from a given branch-point must 
be one. 

After we define conditional probabilities, it will be easy to verify 
that the numbers written on the branches do indeed turn out to be the 
desired conditional probabilities (see Kemeny, Mirkil, et al. [1959], 
Chapter 3). 

Let us suppose that we have constructed a tree 22, for a series of n 
experiments. We consider k additional experiments, obtaining a tree 
2,4. We wish our probabilities to be consistent in the sense that a 
statement p about Q, has the same probability when computed in 
either measure space. Our method of computing measures has this 
consistency property. This assertion is easily verified for k = 1, and 
the result follows by induction. 

In constructing an infinite tree Q for a sequence of experiments, our 
measure is required to have the property that a statement about the 
first n outcomes has the same probability as if computed with respect 
to the finite tree 2,. (Of course, the same probability may be com- 
puted with respect to Q,,,,, but the result is the same by consistency.) 
This convention assigns probabilities to many simple statements. We 
can then show that the probability of a much larger class of statements 
is uniquely determined. We will now consider the problem abstractly. 

Let S be a denumerable set; S is called the state space. Let 2 be the 
set of all infinite sequences of elements of S. A typical element w of 
Q is represented as 

w = (Co, Cy, Cg,.--), 


where Co, C1, C2,... are elements of S. The points w of 2 are called 
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paths, the whole space 2 is called a sequence space, and the value c, 
on a path w is called the nth outcome on w. The function z,(w), 
defined from 2 to S by 


UnlCos C1, Car ++ +s massi) = Cy 


is called the nth outcome function, and the nth outcome is said to occur 
at time n. 
Let F „be the family of all unions of sets in 2 of the form 


{w | zolw) E So A 2 (w) ES, A+++ A Llw) E Sn} 


where So, S;,..., Sn are subsets of the state space S. (Notice that the 
sets of F „ arise from the class of all subsets of the tree 2, described 
above.) It is clear that for each n, F „is a Borel field. Let F be the 
family of sets defined by 


foe} 
F= U Fy 
n=0 


Each set in F is a set of paths for which a finite number of outcomes 
are restricted to lie in certain subsets of S. All other outcomes are 
unrestricted. The reader should verify that F is a field of sets. In 
Section 3 we shall see that F is not a Borel field; in the meantime, we 
let Y be the smallest Borel field containing F (Proposition 1-14). 
After we have defined a measure on Y, the Borel field Z which we are 
seeking will be the augmented field obtained from ¢ by adding subsets 
of sets of measure zero. 

The sets of F are known as cylinder sets. If a cylinder set C is a 
set in ¥,, we note that C may be written as the denumerable union 


C= UB 


of (disjoint) basic cylinder sets 
Bi = {w | Xo(w) = Co A gi(w) = 6 A+++ A alw) = Cy}. 


A basic cylinder set in F, is the set of all possible continuations of a 
single path in Q2,. We let »(B{™) equal the product of the branch 
weights on this path in Q,. 

Recalling that the probability measures we assigned to 2, were 
defined consistently, we can show that v is uniquely defined on the sets 
of F. It has the properties that v(Q) = 1 and that the restriction of 
v to F, is a measure for every n. 

We will next show that v can be extended to a measure p on the 
smallest Borel field containing F. This result will be a consequence of 
Theorem 2-4. First we prove a series of lemmas. In each case F,, 
F, and v are as defined above. 
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Lemma 2-1: Let v be a set function defined on F = |J F, in such a 
way that the restriction of v to F, is a measure for every n. Then 
v is non-negative and additive. 


Proor: Non-negativity is trivial. For additivity, let A and B be 
disjoint sets in F. Then Ac F „and Be F,, say. Since the F, are 
nested, A and B are both in ¥,, where r = sup (m,n). Since v is a 
measure when restricted to F,, 


v(A U B) = (A) + v(B). 


We shall in fact establish that v is completely additive, a result due 
to Kolmogorov. 


Lemma 2-2: Suppose Cy D Cı D C, D--- is a sequence of sets in F 
such that C,¢¥, and lim, (C,) > 0. Then for every m there is a 
basic cylinder set B{” in Fm such that 


(1) lim o(C,, O BY”) > 0 
n 
(2) B® C Crm 
Proor: By complete additivity of v on F ,, where r = sup (m, n), we 


have v(C,) = >; o(C, O B;™) with o(C, A B;™) monotonically de- 
creasing inn. Then 


0 < limv(C,) = lim > oC, 9 Bi) = > lim oC, A B;™). 
n n j j n 
The interchange of limit and sum is justified by dominated convergence 
as follows: The functions of j, namely o(C, O B;™), satisfy 
(C, O B™) < (Co AO B;™), 


and we know that >; v(Co N B;™) = v(Co) is finite since (2) = 1. 
Thus, since a denumerable sum cannot be positive unless one of the 
terms is positive, we have for some j = 7 


lim (C, 0 BY) > 0. 
n 
Hence (1) is satisfied. But the terms in the sequence (C, O B;™) 
monotonically decrease to a positive limit, so that 
v(C_ N B™) > 0. 


Now C,¢F, and is thus the union of basic cylinder sets. Since 
Cm O Bé&™ cannot be empty, we must have Bi™ C Om 
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Lemma 2-3: Suppose Ay D A,... is a sequence of sets in F such 
that A, E F, and lim, »(A,) = L > 0. Then there exists a sequence 
{B™} of basic cylinder sets such that 


(1’) B™ is a basic cylinder set of F,,. 
(2’) lim (A, N B®) > 0. 
m 


(3’) Forn > 0, B® C Bey, 
(4) B® C An 


REMARK: Property (3’) indicates that we are actually constructing a 
single path by adjoining branches one at a time. 


Proor or Lemma: The construction is by induction. For n = 0 
apply Lemma 2-2 to the sequence {A,} and the integer m = 0. The 
B® that the lemma gives is B®. Property (2’) follows from (1), 
(3’) is vacuous, and (4’) follows from (2). Suppose that we have 


constructed B,..., B™; we want B@+), Let 
C = Ak for k< n 
k  \A,0 B™ for k > n. 


The sequence of sets {C} is decreasing and C, E€ Fẹ; we have 
lim v(C;,) > 0 
k 
by (2’) of the inductive hypothesis. Applying Lemma 2-2 to {C,,} and 


the number m = n + 1, we obtain a basic cylinder set B+, which 
we take as B™+), By (2), 


BOD C Oia, = Angi O B™. 
Hence (1’), (3’), and (4’) hold. By (1), 
0< lim VC, O BOD) = lim (A, O B® A BY) 
= lim o(A, A BO) by (3’). 
Hence (2’) holds. i 
Theorem 2-4: Let 2 be a sequence space, let F, be the Borel field 
of sets generated by all statements 
Lo = Co A't A Ly = Cay 


and let F = Un Fn. Suppose for every n there is a probability 
measure v, defined on ¥, with the property that the restriction of 
Vn+1 to F, is vp. Let v be the set function on F whose restriction to 
F „iS v, for alln. Then vis a measure. 
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Proor: By Lemma 2-1 and the contrapositive of the second half of 
Corollary 1-17, it is sufficient to show that if 49 D A, D A, D... is 
a decreasing sequence of sets in F with lim, »(A,) = L > 0, then 
(n 4n # Ø. Since every set in F, is a set in F,,,, we may assume 
that AÁ E F,„ by repeating the same set several times in the sequence 
and by adding, if necessary, the set 2 a finite number of times at the 
beginning of the sequence. The hypotheses of Lemma 2-3 are satisfied. 
We then have N, B® = {w}, where w is a single path of 2. For every 
n, w e B® C A,; hence w Efn A, and n A, # Ø. 


Applying Theorem 1-19 we extend the measure v defined on F 
uniquely to a measure u defined on the Borel field Y. Augmenting 
the field Y, we obtain the Borel field @ with which we shall work. 
The extended measure u is called tree measure and satisfies (2) = 1. 
This completes our construction of the sequence space (Q, Z, p). 
From now on when we refer to a sequence space we shall mean (Q, Z, p) 
constructed in the indicated manner. 


2. Denumerable stochastic processes 


We turn now to the definition of a denumerable stochastic process. 
After the definition we shall show that every sequence space defines in 
a natural way a stochastic process and that every denumerable sto- 
chastic process can, in a way to be described shortly, be considered as a 
sequence space. 

Let S be a denumerable set, which will be called the state space, and 
let (Q, Z, p) be a probability space. Points of Q will be denoted w. 


Definition 2-5: Let {f,} be a sequence of functions with domain 2 
and range in S, and let {%,} be a sequence of Borel fields. The pair 
(fa Bn) is called a denumerable stochastic process on Q if these two 
conditions are satisfied: 


(1) 2, C Bı C B, C---+; and for every n, B, C Z. 
(2) For every fixed n and for each s €S, the set {w | f,(w) = s} is a 
set in Z, 


The second condition in the definition is a measurability requirement 
on fa- If we were to think of the family ¥ of all subsets of S as a Borel 
field, our condition would be equivalent to the demand that the 
inverse image under f, of any set in SZ be a set in 2,- 

First we shall show that every sequence space defines a stochastic 
process in a natural way. Let (Q, Z, u) be a sequence space. We 
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take the outcome functions x, as the sequence of functions and {F „} 
as the sequence of Borel fields. It is clear that 


{w | zaw) = c} 


is a set of F, and that Fy C F, C---C F C Z CZ. The pair 
(£a Fn) therefore is a stochastic process and is referred to simply as 
{x,}. We have thus shown that the outcomes of a sequence of 
experiments with a denumerable range form a denumerable stochastic 
process. 

When we begin to discuss Markov chains, we shall wish to confine 
ourselves to stochastic processes defined on a sequence space. There- 
fore, our second task is to show that the restriction to sequence space is 
actually no restriction at all; this will indicate that our treatment of 
denumerable Markov chains is completely general. 

Let (fa, Za) be a denumerable stochastic process on 2’ with state 
space S. Let u’ be a measure on Q’. We shall construct a sequence 
space (2, Z, u) in such a way that the behavior of the process (fn, Za) 
on 2’ may be studied completely by studying the stochastic process 
{æn} on Q. 

Define Q to be the space of all sequences of elements of S, and let 4 
be the Borel field obtained by the construction in Section 2-1. We 
shall establish a correspondence between paths of Q and subsets of 2’, 
and we define a measure u on Y. 

The correspondence we choose is 


w > {w | fo(w’) = tow) A+ A falo) = zalo) Ae 


To assign a measure to 2, we must assign a measure to cylinder sets, 
such as 
A = {w l Lo(w) E So AeA x,(w) éS,}. 


Noticing that the set A’, defined by 
Al = {0 | fol") Sy A+++ A falo) E Sah, 
is a set in Z, and is therefore y’-measurable, we define 
2A) = p'(4'). 


The measure u can then be extended to a measure defined on all of Z, 
and the construction of the space (2, Z, u) is complete. We thus see 
that an arbitrary stochastic process defined on Q’ may be considered 
as a process on a suitable sequence space 2 in which the f, are out- 
come functions. The probability of any statement concerning the f, 
can be computed in the sequence space. 
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3. Borel fields in stochastic processes 


Probabilities are numbers assigned to statements about stochastic 
processes. We may now formally define the probability of a statement 
to be the measure of the statement’s truth set. In symbols Pr[p] = 
p(P), where P = {w | p}. If the set P is not a set in the Borel field 
on which p is defined, then Pr[p] is undefined. Statements for which 
Pr[p] is defined are called measurable statements. 

In a stochastic process we see that a Borel field of sets represents a 
state of knowledge. The more sets there are in the Borel field, the more 
statements there are that we know how to assign probabilities to. Let 
us analyze briefly what this fact implies about Definition 2-5. 

In a denumerable stochastic process we are given an increasing 
sequence of Borel fields Z, such that Z, C Z for every n. The field 
ZB, represents the state of knowledge of the process up to time n. The 
fact that the Borel fields are increasing means that our knowledge of 
the process never decreases as time goes on, and the fact that all Z, 
are contained in Z means that our total knowledge of the process 
necessarily includes knowledge of what happens in a finite number of 
steps. 

Similarly, condition (2) in the definition is an abstract formulation 
of the requirement that in a stochastic process the present does not 
depend upon the future. We conclude, therefore, that our definition 
satisfies the conditions imposed at the beginning of Section 1. 

We shall apply this insight about the role of Borel fields to a specific 
example to show that the field F in Section 1 is not the same as the 
Borel field Y. Let Q be the sequence space constructed when S is taken 
as a two-element set. Measures 2~‘"*+ are assigned to each set B of 
paths of the form 


B = {w | to = Co AtA £n = Cy}. 


This measure is eventually extended to a measure u on the Borel field 
ZB, and we obtain p(Q) = 1. The state space for this example is often 
taken as S = {heads, tails}, and the model for the stochastic process is 
the tossing of a balanced coin. 

For the coin-tossing process a well-known example of a statement 
whose truth set is not in the field F but is in the Borel field Y is involved 
in the Strong Law of Large Numbers (which we shall consider in more 
detail in Chapter 3). Let k and n be integers and let r, be the fraction 
of the first n outcomes which are heads. Let p be the statement about 
r, that 
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Consider the statement q about k > 0 that 
(Vk)(AN)(Vn)(n > N >p). 


We write the statement in this form to demonstrate that its truth set 
is in Y. In words, the statement q asserts that for any k > 0 there 
exists an N such that if n > N, then |r, — 4| < 1/k. That is, q says 
that lim,.... 7, = 4. The truth set of the statement q is in Y but not F, 
because the notion of limit cannot be expressed in terms of finitely 
many of the r,. The Strong Law of Large Numbers asserts that 


Pr{[g] = 1. 


4. Statements of probability zero or one 


In a probability space (2, Z, u) a statement with truth set Q is 
logically true, whereas a statement with truth set @ is logically false. 
However, the Strong Law of Large Numbers asserts not that a certain 
set is 2 but that it has measure one. A statement whose truth set 
has measure one is said to be almost always, almost everywhere, almost 
surely true, or true a.e. Correspondingly, a statement with truth set 
of measure zero is almost surely false, and the negation ~p of a 
statement p which is almost surely false is almost surely true. 

Two useful propositions are related to the subject of almost sure 
statements. 


Proposition 2-6: Let {p,} be a denumerable set of statements and let 
q be the statement that p, holds for all n. If Pr[p,] = 1 for all n, 
then Pr[q] = 1. 


Proor: Let {P,} be the truth sets of the statements {p,}. Applying 
Proposition 1-18, we have 


1 — Prig] = (X — N Pa) = w(U Pa) < 2 Pa) = 0 
so that 
Pr[q] = 1. 


The second result is one of the Borel-Cantelli Lemmas. 


Proposition 2-7: If {p,} is a sequence of statements for which 
> Pr[p,] is finite, then with probability one only finitely many of the 
Pn are true. 
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Proor: Let q, be the statement that all of the statements Pn, p,41,--. 
are false. Let e > 0 be given. Choose N large enough so that 
Yew Pr[p,] < «. Then 


1 — Pr{qy] = Pr[at least one of py, Py +1,- - occurs] 
= Pripy V Puei Vte] 
By Proposition 1-18 the right-side is 


< > Pr[p,] < €. 
n=N 
Hence, 


Pr[finitely many p, are true] = Pr v A 
n=1 


= Priqy] 
>l-e. 


Since this inequality holds for every e > 0, the probability must be 1. 


5. Conditional probabilities 


Let (Q, Z, u) be a probability space. If p and q are statements 
such that Pr[qg] # 0, the conditional probability of p given q, written 
Pr[p |q], is defined by 


Pr[p |g] = Prip ^ q]/Pr{q]. 
If Pr[q] = 0, we shall normally agree that Pr[p |q] = 0. (Alterna- 
tively, we might leave Pr[p | q] undefined if Pr[g] = 0. Such a con- 
vention would be adopted in a more general context.) 

The case Pr[q] = 0 is not very interesting. Suppose Pr[q] # 0, and 
let Q C Q be the truth set ofg. IfQ is taken as a space of points, then 
B = {B' | B' = Q A B, B e P} is a Borel field of sets in Q. For any 
set B’ in Z' we define a set function v by 


n _ HB) 
AOSS BO) 
The reader should verify that v is a measure on Z’. Furthermore, 
v(Q) = u(Q)/u(Q) = 1. Therefore, (Q, Z',v) is a probability space. 
We may thus speak of the probabilities of statements relative to Q, 
and we see that their values coincide with conditional probabilities 
given q and relative to 2. That is, conditional probabilities possess 
the same properties as ordinary probabilities. 
We shall apply this notion of conditional probability to the sequence 
space considered in the preceding sections. In this space the sets of Z, 
are denumerable disjoint unions of sets of the form 


B, = {w | ao €8q A+++ A £n Sy} 


2-8 Conditional probabilities 51 


Since the state space S is denumerable, each of the sets So, S,,..., Sn 
is denumerable and B, is the denumerable union of disjoint sets of the 
form {w |£o = Co A-:-A £a = Cn}. By definition of conditional 
probability, 
Pr[zo = co AtA n-i = Cp-1 A En = Cy] 
= Priz, = c, | Xp = Co Att A Ly = Cn-1] 

x Pri = co A+++ A £n-1 = Cy-4]: 

By induction, we find 


co]: Prix, = c, | Lo = Co] 


Pr[zo = Co A+++ A £n = Cy] = Prix 
x Prlte = cz | £o = co A xı = c] 
x Pr[xs = c3 | £o = Co A £1 = C1 A £3 = Co] 
xe x Prix, = cp | £o = Co At A Zn- = Cail 


We have established the following result. 


Proposition 2-8: The measure on a sequence space is completely 
determined by 


(1) the starting probabilities, Pr[xz = co], and 
(2) the transition probabilities, 


Prz, = cn | £o = Co AtA She = Gl: 


Conditional probabilities, as we anticipated, have their place in tree 
diagrams. To each branch we may assign a conditional probability. 
We see now the abstract formulation of the fact that the probabilities 
of statements like x) = Co A 2; = 6&1 A't- A En =C, are computed 
simply by multiplying together the appropriate branch weights. 


Q 


S={H, T} 


xo X% x2 
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Two statements p and q are said to be probabilistically independent 
if Pr[p A q] = Pr[p]Pr[qg]. A stochastic process defined on sequence 
space is called an independent process if the statements 


Lyn = Cy 
and 
Lo = Co N+ ++ A Lay = Cn-1 
are independent for all n > 1 and for all co, c,,...,C,. Coin tossing is 


an example. We shall see that an independent process is a special 
kind of Markov process. If, in addition, for each c the probability 
that x, = c does not depend on n, then the process is called an 
independent trials process. 


6. Random variables and means 


Let (2, Z, u) be a probability space. A measurable function f with 
domain Q and range in the extended real-valued number system is 
called a random variable. We may apply all the properties of the . 
measurable functions in Section 3 of Chapter 1. 

If f(w) is an extended-real-valued function defined on the space Q 
of a stochastic process and if Q has the property that for some n, f(w) 
is measurable with respect to the Borel field Z,, then f is a random 
variable because Z, C Z. Such a function is said to be ,-measurable. 
For the special case in which 92 is sequence space, a function f is Z,- 
measurable for some n if its values depend only on a bounded number 
of outcomes. 

In terms of sequence space, we give two examples of random 
variables. 


(1) Define a function u;” by 


gi 1 if a, =) 
7 0 otherwise. 


Since the value of u;” depends only on outcome n, the function u;™ 
is @,-measurable and is therefore a random variable. Letting n; = 
S-o u;™ and noting that the limit of a sequence of random variables 
is a random variable, we see that n, is a random variable. The func- 
tion n,(w) counts the number of times that the outcome j occurs on the 
path w; it will appear again after we introduce Markov chains. 

(2) Let t; be the time to reach j. That is, define t,(w) to be the first 
time n along path w such that z,(w) = j. Ifj is never reached, set 
t; = +œ. Then t, is a random variable because it is the limit on n of 
the function t,” = inf (t,, n), which is #,-measurable. 
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Definition 2-9: A random time t is a random variable satisfying these 
two properties: 


(1) Its range is in the non-negative integers with {+00} adjoined. 
(2) For each integer n the set {w | t(w) = n} is a set of Z,. 


The random variable t, defined in the second example above is an 


example of a random time. 
The mean of a random variable f, denoted M[f], is defined by 


M[f] = Í o fdp, where M[f] exists if and only if Í o fdp exists and where p 
is the measure associated with the probability space. Since means are 
Lebesgue integrals, they satisfy all the properties of Lebesgue integrals. 
In particular, if {f,} is a sequence of non-negative random variables, 
then by Corollary 1-46, 


ul > t| = > MIf,]. 


In addition, if g, is a sequence of random variables with the properties 
that |g,| < c for every n and that g, converges to g, then 


Mig] = lim Mig,] 


by Corollary 1-50. An important application of these facts is the 
following result. 


Proposition 2-10: In a sequence space, 
Min] = >) Prix, = j]. 
n=0 


PROoF: 


M[n,] = m|> u] 
= > M[u;™] 


= > f udp 
n JQ 
= > f ldu 


n 
{@|2n(@) = j} 


= > Pri, = j]. 


7. Means conditional on statements 


Suppose (Q, Z, u) is a probability space on which a denumerable 
stochastic process is defined. We say that a denumerable family of 
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subsets Z = {R,, R,...} is a partition of Q if the sets R; are disjoint, 
exhaustive, and measurable. Each such subset R, is called a cell of the 
partition; we allow the possibility that a cell may have measure zero. 
The reader should notice that the sets {@, R,, R.,...} together with 
all possible denumerable unions form a Borel field which we shall call 
#*. Since the sets in # are measurable, we see that Z* C Z. 

Two examples of partitions are typical. 


(1) Let the process be coin tossing, and define a partition by 
R, = {w | x(w) is a head}, R, = {w | xo(w) is a tail}. 


More generally, let Æ consist of disjoint exhaustive measurable sets 
which are in Z, for some fixed n. 

(2) Suppose f is a random variable whose range is denumerable. 
If the range is {a}, define R; = {w|f(w) = a}. Then {R,} is a 
partition. 

A denumerable set of statements {q;} about a stochastic process is 
said to be a set of alternatives if the truth sets Q; of the statements form 
a partition. Since the integral is a completely additive set function, 
we obtain, for every random variable f whose mean exists, the relation 


Mif] = > Í fdu. 
i Q; 
Let p be a statement with measurable truth set P. If fis a random 


variable and if Pr[p] 4 0, then the conditional mean of f given p, 
written M[f | p], is defined by 


1 


If Pr[p] = 0, then M[f | p] is defined to be zero. (In a more general 
setting, M[f | p] is not defined when Pr[p] = 0.) 


Proposition 2-11: If M[f] exists and {q;} is a set of alternatives with 
truth sets {Q,}, then 


M[f] = > Pr[g:]-M[£ | q]. 


Proor: By definition of conditional mean, we have 
_, fe = Pola) MIE | a 


Summing both sides of the equation on 7 gives the result immediately. 
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Corollary 2-12: Let p be a statement with measurable truth set P, 


and let {q;} be a set of alternatives with truth sets Q;. Then 


Prip] = >, Prigd-Pep | g] = > Prip A ai) 


Proor: Let f be the characteristic function of the set P and apply 


Proposition 2-11. 


8. 


1. 


Problems 


For coin tossing, show that 
(a) the probability of getting only finitely many “heads”’ is 0; 
(b) there will be infinitely many “heads” a.e. 


. Consider the experiment of selecting “1” with probability 4, “2” with 
probability 4, or “3” with probability 4. If this experiment is repeated 
infinitely often, show that the probability of selecting “1” only finitely 
often is 0. 


. In Problem 2, let Za = Fn. Which of the following f, form a denumer- 
able stochastic process (fa, Zn)? 
(a) f, = the nth outcome. 
(b) fa = time at which nth “3” is selected. 
(c) fa = Sn = sum of the first n numbers selected. 
Sn— a 


Ven” 


where a and c are constants. 


e 
= 
1 


. Show that the following converse of the Borel-Cantelli Lemma is false: 


If > Pr[p,] = +00, then the probability that only finitely many p, are 
true is less than 1. 


. Start with 4 jacks and 4 queens from a bridge deck. Find the prob- 
ability of drawing two cards, both of which are jacks. Then compute 
the conditional probability of the same on each of the following 
conditions: 

(a) One card drawn is a jack. 

(b) One card drawn is a red jack. 

(c) One card drawn is the jack of hearts. 

(d) The first card drawn is the jack of hearts. 


. For coin tossing, define three random variables which are not measurable 


with respect to any finite tree Q,. 


. Let ‘n,(w) be the number of times on path w that a j occurs before the 
first i occurs (if i = j, take ‘n,(w) = 0). Prove that in, is a random 
variable, and develop an infinite series representation for M[t,] in terms 
of it. 


. In coin tossing, let t be the first time that “heads” comes up. 
(a) Is t a random time? 
(b) Find M[t]. 
(c) Find M[t | first outcome is ‘‘tails”’]. 
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10. 


11. 


12. 


Stochastic processes 


. In a randomly selected two-child family, let f = the number of boys. 


Find 

(a) M[f]. 

(b) M[f | first child is a boy]. 

(c) M[f | there is at least one boy]. 


For coin tossing, let f = the number of “heads” in the first three tosses. 
Let q; be the statement that there were exactly i “heads” in the first 
two tosses (i = 0,1,2). Find M[f], M[f| qo], M[f| qi], and M[f | g2]. 
Verify that the first of these is a linear combination of the last three with 
appropriate probabilities as weights. 


Let {q;} be a set of alternatives and let f be a non-negative random 
variable. Prove that if M[f | ¢,] > M[f], then there is an alternative q; 
such that M[f | g,] < M[f]. 


Let X be the unit interval. Let F consist of all finite unions of finite 

intervals (with or without endpoints). We classify a point as an interval 

of length 0. The measure to be constructed in this problem is called 

Lebesgue measure, and the Borel field is the class of all Borel sets. 

(a) Show that F is a field. 

(b) Show that every set A in F can be written as a finite disjoint union 
of intervals A,, Ag,... 

(c) If A is decomposed as in part (b), we define 


(A) = 5 lAr), 
k=1 


where l denotes length. Show that v is consistently defined and that 
v is a non-negative additive set function. 

(d) Show that if A is a finite union of intervals and if « > 0 is given, 
then there is a finite union K of closed intervals and a finite union G 
of open intervals such that K C A, A C G, and 


v(K) + € > A) > (Q) — e. 


[Note: An open interval here means a set which is the intersection of 
X with an open interval of the real line.] 

(e) Let A and A,,n = 1, 2,..., be finite unions of intervals with the A, 
disjoint and with U A, = A. Use parts (c) and (d) and the Heine- 
Borel Theorem to prove that, for any e > 0, 


v(A) < > vln) +€. 


(£) Deduce from part (e) that v is completely additive on F. 

(g) Apply Theorem 1-19 and describe the resulting measure space. 

(h) Using complete additivity, prove that a denumerable set of points 
has measure 0. 

(i) Why does the proof of (h) not show that every set has measure 0? 

(j) Show directly from the definition 


p(B) = inf [> (A,)] 


that every denumerable set of points has measure 0. 
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13. Let (Q, Z, u) be the unit interval with Lebesgue measure. If xe Q, let 
f(x) = x?. Let p be the statement “x < $.” 
(a) Show that f is a random variable. 
(b) Find M[f]. 
(c) Find M[f | p] and M[f | ~p). 
(d) Relate your answer in (b) to the solution of (c). 


CHAPTER 3 


MARTINGALES 


1. Means conditional on partitions and functions 


In this chapter we consider a natural abstraction of the idea of a 
fair game in gambling. We shall give several applications of the basic 
result, the Martingale Convergence Theorem. 

We begin by defining what we mean by the conditional mean of f 
given a partition Z@ of the domain. Let (2, Z, u) be a measure space. 
We shall normally assume »(2) < 1, but such an assumption is not 
necessary as long as u is finite. 


Definition 3-1: Let w € Q and let r, be the statement that w is in a 
cell R; of Z. If fis a random variable, then the conditional mean of 
f given 2, written M[f | 2], is defined to be a function of w whose value 
at every point in the celi R, is the constant M[f | r,]. 


Next, we observe that if M[f] exists and is finite, then M[f | 2] 
exists and is finite. For, on cells of measure zero, the conditional mean 
is defined to be zero. If u(R,) > 0, then 


l l 
M{|f| | 2] = MEIE] | R,] = a [fldu < -yy MUEI] < 00. 


The next proposition provides an equivalent definition of the mean 
of f given Ż. 


Proposition 3-2: M[f | Z] is characterized almost everywhere by 
these two properties: 

(1) M[f | 2] is constant on each cell of 2. 

(2) fe, fdp = fe, Mif | Ady. 


Proor: We must show that M[f | 2] has these two properties and 
that any random variable satisfying these properties is equal to the 
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conditional mean of f given Z a.e. Set g = M[f | Z], and suppose 
first that g is known to be a conditional mean. Then g satisfies (1) by 
definition. If u(R,;) = 0, both sides of (2), are 0. If not, equality 
again follows from the definition of M[f | 2]. 

Conversely, if a function g satisfies (1) and (2), then it agrees with 
M[f | Z] on all paths in cells of positive measure; that is, it agrees with 
M[f | Z] a.e. 


We give two examples of conditional means. 
(1) Let the stochastic process be coin tossing and let 
R, = {w | xo(w) is a head} 
R, = {w | xo(%w) is a tail}. 
Let f be the total number of heads on the zeroth and first tosses. Then 


on R; 


MIf | 2] = E o 


ve nje 


and 


MEJ = (3)(2) + (4)() = 


(2) For any stochastic process and for any denumerable-valued 
random variable f, let Æ be the trivial partition {Q}. Then 


M[f | 2] = M[f] 
on every path w. 
A partition Z is said to be contained in S, written Z C Y, if every 
cell of 2 is a union of cells of F. If ZC YF, then Ž is the “coarser” 
subdivision, and Y is called a refinement of 2. 


Proposition 3-3: Conditional means satisfy these properties: 

(1) M[M[f | 4] | 2] = ME | lif AC SF. 

(2) M[M[f | FI] = M[f] for any /. 

(3) If g is constant on each cell of Z, then M[g | Z] = g a.e., and if 
M[fg] exists and is finite, then M[fg | 2] = gM[f | 2]. 

(4) If f and g assume only finite range values and if M[f | 2] and 
Mig | Z] both exist and are finite, then M[f + g | Z] exists, is 
finite, and is given by 


M[f + g | Z] = M[f| 2] + Mig | 2]. 
Proor: For (1) it is sufficient to show that 


fo ete) 1 | Zidu = f ME | Zidu 
R Ry 
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since both functions are constant on cells of Z. Applying property (2) 
of Proposition 3-2 three times, we have 


[mower | 11 aide = f ME idy 
R; R, 


=) f MELS 
i 


SiC Ry 


=>, Í fdu 
t sity 


= fdu 


Ry 
2 Í MIf | ZJdp. 
R; 


The proofs of (3) and (4) use the same technique and are left to the 
reader. Property (2) follows from (1) with Z taken as the trivial 
partition {Q}. 


Definition 3-4: The cross partition Z © S of two partitions Z and 
F is the family of sets defined by 


RQF ={R,AS8,| Re2, 8S eF, ROS, + Ø}. 


For example, 


R S ROS 


Since the intersection of measurable sets is measurable and since the 
sets { R; N S,} are disjoint and exhaustive, a cross partition is a partition. 

In Example 2 of Section 2-7 we noted that every denumerable- 
valued random variable determines a natural partition of 2. We 
call the partition associated with the denumerable-valued random 
variable g, Z. In terms of the natural partition induced by g, we 
define the conditional mean of f given g, by 


Mf | g] = M[f | 2°]. 
Then M[f | g] is constant on sets where g is constant, and on the set 
where g = c for some constant c, M[f | g] has the value M[f | g = c]. 
Observing that the operation of forming the cross partition of two 
partitions is both associative and commutative, we define more 
generally 


Mfrs | fo A+: A fa] = Mifi | Bo @---@ A]. 
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If p is a statement with truth set P, we define 
Pr[p | fo A+++ A fal] = Mixe | fo A+++ A fah, 
where yp is the characteristic function of the set P. 


A sequence of random variables {f,} is said to be independent if, for 
every n and for every A, 


Pr[f,41¢€A|fp A---A f,] = Prif,4,¢ A] 


almost everywhere. The reader should show that if {f,} is a sequence 
of independent random variables, then 


M[fn +1 | fo A---A fa] = M[f, +1]. 


In the special case in which, for each A, Pr[f, € A] does not depend on 
n, the random variables are said to be identically distributed. 


2. Properties of martingales 


With the background of Section 3-1, we proceed to define martingales 
and to give several examples of them. We still work with the prob- 
ability space (2,@,,) and a denumerable set of states S. We 
assume, however, that the set S is a subset of ti -xtended real number 
system. 


Definition 3-5: Let {f,} be a sequence of denumerable-valued random 
variables, and suppose @, C 2, C--- is an increasing sequence of 
partitions of Q. The pair (f,,Z,) is called a martingale if three 
conditions are satisfied: 


(1) M[|f,|] is finite for each n. 
(2) f, is constant on cells of &,. 
(3) fa = M[f, +1 | Ral. 


If (1) and (2) are satisfied and (3) is replaced by f, = M[f,., | #,], then 
(f,, 2,) is a supermartingale. If (1) and (2) are satisfied and (3) is 
replaced by f, < M[f,.: | 2,], then (f,, Z,) is a submartingale. 


For a martingale, condition (3) in Definition 3-5 implies condition 
(2), but for a supermartingale or a submartingale it does not. 

When we defined partitions in Section 2-7, we noted that every 
partition @, determines a Borel field #,* and that the Borel field 
satisfied Z,* C Z. Since Za, C B,,, implies Z,* C Z,.1*, we see 
that a martingale is a stochastic process. The reader should notice 
that the condition that f, is constant on cells of Z, is equivalent to the 
condition that f, is measurable with respect to Z,*. 

Throughout the discussion of martingales, it is convenient to keep in 
mind the idea of a fair game, which we shall introduce as our first 
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example below. We shall see that the fair game is a special case of 
the following situation. Let fọ, f,,... be a sequence of denumerable- 
valued random variables defined on a probability space such that 
M[|f,|] is finite. Setting Z, = Bo @---@ Ax, we see that the 
sequence {%,} is clearly increasing. Therefore, only condition (3) of 
Definition 3-5 need be satisfied for (f,,2,) to be a martingale. In 
particular, we see that such a sequence of random variables forms a 
martingale if and only if 


M[f,41| fo Af, A---A fa] = f,. 


Wher the partitions 2, are obtained from the 2 in the way we have 
just described, we agree to refer to the pair (f,, #,) simply as {f,}. 

We shall give three examples of martingales at this time. More 
examples will appear after we introduce Markov chains. 


(1) Let {y,} be a sequence of independent random variables with 
denumerable range and let s, = yo +---+ y, represent the nth 
partial sum. Then {s,} with its partition obtained from the s» in the 
natural way is a martingale if and only if M[y,] = 0 for every k. We 
have 

M[s,+1] So AtA Sa] = M[ynsi1 + Sn | So A eee A Sa] 
= M[yn+1| 80 A “+A Sa] 
+ MIs, | So A+++ A Sa] 
= Miyn+1 | So A+++ A Sq] + Sn 
Miyn+1 | Yo Act A Yn] + S, 
M[y,+1] + 8, by independence 
s, if and only if M[y,,,] = 0. 


I 


A special case in which the y, have identical distributions is the 
sequence of plays of a game of chance. A fair game is one in which the 
expected fortune M[s,,,] at any time n + 1 is equal to the actual 
fortune s, at time n. Matching pennies is a fair game, whereas roulette 
is unfair. A game like roulette that is favorable to the house is a 
supermartingale. From the calculation above, we see that a game is 
fair if and only if the mean amount won in each round is zero. 

(2) A particle moves on a line stopping at points whose coordinates 
are integers. At each step it moves n units to the right with prob- 
ability {p,}, where n =..., —2, —1, 0,1, 2,..., only finitely many of 
the p, are different from zero, p, # 0 for some negative value and some 
positive value of n, and > p, = 1. The particle’s position after j 
steps is x, Set f(s) = >), p,s*, and consider the positive roots of the 
equation f(s) = 1. Since f’(s) > 0 for s > 0, there are at most two 
roots of the equation, and since f(1) = 1, either one or two roots exist. 
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Ass — 0 or œ, f(s)—> œ. Hence either 1 is a minimum point or there 
are two positive roots. If 1 is a minimum point, then f’(1) = 0, and 
hence {x,} is a martingale, and if r is a root other than 1, then {r%} is a 
martingale. The details of verifying these assertions are left to the 
reader. We shall need to use these results later. 

(3) Let Q be the closed unit interval [0, 1] on the real line, let the 
measure of an interval be its length, and let Z, be the partition 
{[0, 2~"], (27*, 2-2-"],..., ((2" — 1)2~", 1]}. Let f be a monotone 
increasing function on [0, 1] and let f, be a function which is constant 
on the interval (c2~", (c + 1)2~"] and whose value at any point in the 
interval is 

an(f((c + 1)2-") — f(e2-")). 
Thus f, is an approximation to the derivative f’, if it exists. The 
reader should verify that (f,, #,) is a martingale. 


3. A first martingale systems theorem 


In the first example in the preceding section, we saw that martingales 
bear some relation to gambling. A fair game is a martingale, a game 
favorable to the house is a supermartingale, and a game favorable to 
the player is a submartingale. A gambling system is a device to take 
advantage of the nature of a game of chance in order to increase the 
player’s expected fortune. Systems theorems are theorems which 
prove that certain classes of gambling systems do not work. For our 
first systems theorem, which we shall need in the proof in the next 
section, we require a lemma. 


Lemma 3-6: Let {#%,} be an increasing sequence of partitions and 
let {f,} be a sequence of random variables. Suppose R, is a set in the 
Borel field &,,*. 

If (fn, #,) is a submartingale, then "S f,.4;dp = fe fdp. 

If (f,, 2,„) is a martingale, then Jas f.d = fo, fd. 

If (f,, Z,a) is a supermartingale, then fis fwd < fp, ferdy. 


Proor: We shall prove only the first assertion. By Proposition 3-2, 
fg fpd = fo. MIf; +1 | Z;ldu, and since {f,, #,} is a submartingale, 
we know that 

M[fi+1 | Zil] = fr 


The result follows immediately from integrating this inequality. 


Proposition 3-7: Let (f,,4,) be a submartingale, and suppose that 
{e,} is a sequence of random variables such that €,(w) = 1 or 0 for every 
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w and such that the set {w | €,(w) = 1} is a set in 2,*, the Borel field 
generated by #,. Define 

fa = fo + €o(f: — fo) + ex(fe — fi) +--+ + €n—alfy — fy-1)- 
Then (f,, #,) is a submartingale and M[f,] < M{[f,]. 


REMARK: Analogous results hold for martingales and for super- 
martingales, but we need only what is proved here. 


Proor oF PROPOSITION: We first show that M[f,,, | 2a] = fe We 
have 
Mf, +1 | Ba] = Mif, + €n(fn +1 JE fa) | Bal 
= Mif, | B,) + Me, (fn+1 = fn) | Ra] 
F fn ae M[e,(f, 41 = fn) | Ral- 
Since {w | €,(w) = 1} is the union of cells of 2,, €, is constant on cells 
of Z,- Thus by Proposition 3-3, the above expression 
a f, + €,M[(f,+1 — fn) | Ra] 
= f, + €,(M[f, +1 | Ra] < fn), 
which is 
> f, 


because (f,, Z,) is a submartingale and e, is non-negative. It remains 
to be shown that M[f, — f,] = 0; we prove the result by induction on n. 
For n = 0, fy = f, and M[f, — f,] = 0. Suppose we have proved that 
fo (f£. — f,)du > 0. Then we have f,,, =f, + e,(f,,, — fr), and 
when we subtract both sides from f,,,,, we get 


faa — Ae = fk+1 — fy — erlfk+1 — fx) 
= (1 — ek)(fk+i — fk) + (fe — fr). 
Thus 


f (fra. — fk +1)du 
2 


IV 


f (1 — €,)(f,41 — f,)du by hypothesis 
Q 


(fki — f,,)dp 
{ex (w) = 0} 


2 0 by Lemma 3-6. 


To see the connection of Proposition 3-7 to gambling systems, we 
again consider Example 1 of Section 3-2. We take the partial sums 
s, as the f,, and we observe that the differences s, — s,_, in the 
definition of 8, are simply the random variables y,. The §, become 
modified fortunes, fortunes changed by not playing in every round of 
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the game. When e€, = 1, the player participates in the k + Ist game; 
and when e, = 0, he does not. The whole set of e’s, therefore, 
represents a gambling system; the condition that the set of paths for 
which e€,(w) = 1 be the union of cells of 2, is the condition that the 
system not depend on any knowledge of the future. For the special 
case in which the process is a submartingale, the expected fortune after 
time n + 1 is greater than or equal to the fortune at time n and the 
game is favorable for the player. The content of Proposition 3-7 is 
that the player’s expected fortune after time n + 1 would not have 
been increased by a system which caused him not to play in certain 
rounds. 


4. Martingale convergence and a second systems theorem 


In this section we shall prove two theorems which will be of great use 
in our treatment of Markov chains. The two results will indicate the 
value of recognizing martingales when they appear in our later work. 


Definition 3-8: Let {f,} be a sequence of random variables defined on 
points w, and let r and s be numbers with r < s. An upcrossing of 
[r, s] is said to occur between n — k and n for the point w if these 
conditions are satisfied: 

(1) f, -(w) =r 

(2) r < fh_xin(w) <s forO<m<k 

(3) f,(w) = s. 


The reader should notice that no two upcrossings overlap. 


af 


fn-k 


After three preliminary results, we proceed with a proof of the 
Martingale Convergence Theorem. Proposition 3-11 is known as the 
Upcrossing Lemma. 


Lemma 3-9: If (fn, @,) and (g,, Ža) are submartingales, then so is 
(sup (fn, Bn)» Rn). 
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PROOF: 
M[|sup (fn; 8n)|] < M[sup (fl, |gnl)] 


A 


iid + f Igid < œ. 
lfnl = lgan] Ifnl < Ign] 
The function sup (f,, gn) is clearly constant on cells of 2, if f, and g, 
each are. Furthermore, 


M[sup (faa Sn+1) | Bal 2 M[f, 41 | Ral 


IV 


fn 
and 
M[sup (Enti Zn+1) | KA 2 M[gn+1 | Bal 


IV 


Bn 
so that 


M[sup (f+. Sn+1) | Ra] 2 sup (fn Zn). 


Lemma 3-10: If (f,,4,) is a martingale, M[f,] = M[f,_,]. If 
(f,,Z,) is a supermartingale, M[f,] < M[f,_,]. If (£n, Z,) is a sub- 
martingale, M[f,] > M[f, _,]. 


Proor: The result is immediate from Lemma 3-6 when R, is taken 


as Q. 


Proposition 3-11: Let (f;,Z,;) be a submartingale, and let B(w) be 
the number of upcrossings of [r, s] between times 0 and n. Then 


M[B] < M{(f, on r)*] < M{/f, |] + Irl. 


Sf iat & 


Proor: Consider first the special case fẹ, > 0 for 0 < k < n and 
r= 0. Let f, be defined as in Proposition 3-7 with the e’s to be 
specified. For a given path w, define, by induction on m, e€,(w) = 1 
whenever f,,(w) = 0, and let €,,,;,(w) = 1 as long as f,,,,(w) < s. As 
soon as f,,(w) > s, require that e,,(w) = 0 until m is large enough so 
that f,,(w) = 0 again. Then f,, measures the increase in the sequence 
f,(w),..., f (w) during upcrossings (plus a possible “partial upcrossing”’ 
at the end) and is greater than or equal to the minimum increase in 
each upcrossing multiplied by the number of upcrossings. That is, 


8-B(w) < Î (w). 
s-M[B] < M[f;], 


Hence 


A 


which by Proposition 3-7 is 


A 


< M[f,]. 
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Therefore, 
MIP] < (1/s)M[f,] 


and the special case is proved. For a general sequence {f,} and general 
r, consider the function (fẹ — r)*, which is the supremum of the zero 
function and the function f, — r. It is readily verified that constant 
functions are martingales and that the difference of a submartingale and 
a martingale is a submartingale. Thus (f, — r, @,,) is a submartingale 
and by Lemma 3-9 ((f, — r)*, #,) is a submartingale. Applying the 
special case proved above to (f,, — r)* and upcrossings of [0, s — r], we 
find 
(s — r)M[B] < MI(f, — r)*] 


< MIIE, = r|] 
< MIIR] + [rl] 
= MIIS] + [rl 


and the proof of the Upcrossing Lemma is complete. 


Theorem 3-12: If (f,, Z,) is a submartingale with the property that 
M[|f,|] < K < œ for all n, then 
lim f„(%w) 


n> 


exists and is finite for almost all points w. 


Proor: Failure of almost-everywhere convergence means that there 
exists a set of points w of positive measure for which the sequence 
diverges. At least one of two things must happen. Either f,(w) for 
each fixed w in a set of positive measure oscillates infinitely often 
above .and below rationals r(w) and s(w) with r(w) < s(w), or else 
|f,(w)| diverges to +00 on a set of positive measure. We consider the 
cases separately. 


(1) Suppose |f,(w)| diverges to +00 on a set E of positive measure m. 
Then by Fatou’s Theorem 


I lim inf |f, (o)|du < lim inf Í [f,(co) [de 
E n n E 


< lim inf Í If, (o)]du 
n Q 

= lim inf M[|f,|] 

< K. 
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But lim inf |f,(w)| = +œ on E, and E has positive measure m. Thus 
the left side of the inequality is infinite, and we have arrived at a 
contradiction. 

(2) Suppose f,(w) oscillates infinitely often above and below rationals 
r(w) and s(w) on a set of positive measure m. Order the set of all pairs 
of rationals (which is a denumerable set) and call the kth pair gy. 
Consider the denumerable family of sets A, defined by 4, = {w | f,(w) 
oscillates infinitely often above and below the rationals of the pair q,,}. 
It is possible for more than one set to have the same point in it, but, on 
the other hand, every point w for which f,(w) oscillates infinitely often 
is in some A,. Therefore, 


> mAr) = WLU Ax) = m > 0 


and there must exist a t for which »(A,) > 0. That is, for every w in a 
set A, of positive measure, f,(w) oscillates infinitely often above and 
below fixed rationals r and s with r < s. Let B,(w) be the number of 
upcrossings of [r, s] by fo(w),..., f,(w). By Proposition 3-11, 


MIISI] + |r| 
Ss 


MIB] < — t+ 


< E + Ir 
s—r 


=c for every n. 


Furthermore, the B, are non-negative and increasing with n to a func- 
tion B, so that M[B] = lim M[B,„] < c by the Monotone Convergence 
Theorem. But M[B] = +œ since there are infinitely many upcrossings 
on a set of positive measure. This contradiction establishes the 
Martingale Convergence Theorem. 


Corollary 3-13: Every non-negative supermartingale converges to a 
finite-valued function almost everywhere. In particular, every non- 
negative martingale converges almost everywhere. 


Proor: If (f,, #,) is a non-negative supermartingale, then (—f,, 2,) 
is a submartingale. Since f, > 0, |—f,| = f, and hence M[|—f,|] < 
M[f,] by Lemma 3-10. Therefore, {—f,} converges almost everywhere 
by Theorem 3-12, and so does {f,}. 


We postpone a discussion of applications of Theorem 3-12 and 
Corollary 3-13 to the next section. We shall find that the corollary is 
used more frequently than the theorem itself. 
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A random time which is finite almost everywhere is called a random 
stopping time or simply a stopping time. If t is a stopping time, then 
the set (\2_, {w | t(w) > n} has measure zero. If {f,} is a sequence of 
random variables, we define a function f, almost everywhere by 


flw) = f,(w) if t(w) = n. 
Since 


ioe) 


{w|f,(w) < c} = YU ({w | tlw) = n} A {w | f,(w) < ch), 


f, is a random variable. 


Lemma 3-14: If (f,, Z,) is a martingale and if t is a stopping time for 
which Ja fidu exists, then for any n 


Í fae Í AME Í fdp. 
2 {t<n} {t>n} 


Proor: We have 


2 K=O: ek {t>n} 


È fidu f fae, 
k=0 


{t=k} {t>n} 


which by Lemma 3-6 


= > fdu + fidu 


ree {t=k} {t>n} 


fidu + | fidu. 


{t<n} {t>n} 


Theorem 3-15: If (f,, Z,) is a martingale and if t is a stopping time, 
then M[f,] = M[f)] if 


(1) M[|f,|] < œ, and 
(2) lim Í f,du = 0. 


{t>n} 


Remark: Analogous results hold for submartingales and for super- 
martingales. Inequalities replace the equality in the conclusion. 
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PROOF OF THEOREM: By (1), fọ fidu exists, so that Lemma 3-14 
applies. Thus, for any n, 


f fidu 
Q 


Í f,du + Í fidu 


{tan} {t>n} 


Í pane | fdp + Í fdu, 
Q 


{t>n} {t>n} 


which by Lemma 3-10 


= f fody — f fidu + f fdp. 
R 


{t>n} {t>n} 


Using condition (1) together with Corollary 1-17 and the complete 
additivity of the integral as a set function, we see that 


lim f fidu = 0. 
7 {t>n} 


Since fu, n fandu — 0 by hypothesis, we have fo fidu = fo fody. 


Corollary 3-16: Suppose (f,, Ž,) is a martingale defined on a space Q 
of finite total measure and t is a stopping time. If |f,| < K for all n, 
then M[f,] = M[fo]. 


ProoF: We must show the two conditions of Theorem 3-15 are 
satisfied. For (1) we have |f,| < K by definition, and hence f, is 
integrable. For (2) we have 


fae kS f If, [du 


{t>n} {t>n} 


IA 


Kdp 
{t>n} 
Kyu({w | t(w) > n}) 


—> 0. 


In terms of gambling systems, the result of Lemma 3-10, namely that 
for martingales M[f,] = M[f,], states that the expected fortune at any 
fixed stopping time is equal to the initial fortune if the game is fair. 
That is, the fairness of a game is not altered by deciding in advance to 
stop playing at some fixed time. But what about a system where the 
player stops according to how the game is going? The system he 
adopts is represented by the random time t, and Theorem 3-15 and 
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Corollary 3-16 give sufficient criteria for the game still to be fair. 
Corollary 3-16 by itself is a general result; it covers the situation, for 
example, where the game stops if either the player or the house goes 
bankrupt. If the game does stop under such circumstances, the 
corollary states that the fairness of the game is not altered by any 
gambling system whatsoever. Similar remarks apply to super- 
martingales. If the amount of money that a player has is limited, no 
system that he adopts for stopping according to how the game is going 
will make an unfair game favorable. 

The following proposition is useful in proving that certain random 
times are stopping times. 


Proposition 3-17: Let (f,,%,) be a martingale, let t be a random 
time, and let f, be the stopped process with 
f,(w) = min(n,t())(@)- 
Then (f,, Z,) is a martingale. 
Proor: We first note that M[|f,|] < œ since |f,| < 57-0 |f,| and 


each f; is integrable. Next, let R be a cell in 2, with (R) # 0. Ink 
we have 


l 
Mifs | Bal = g f, eode 
1 
= am| Í fns idu + f fee 
ROtSn} RO{t> n} 


= —- fidu + Í frad 
R | Í nope n+1®H 
Hl ) RA{tsn} RO{t> n} 


by definition of f,. Since (f,,%,) is a martingale and {t > n} is in 
PZ, the above expression by Lemma 3-6 is 


1 

S fidu + Í fd 

el | f a Ai 
RA{t<n} ROAO{t>n} 


1 
uR) | i fidu + f fa 
RO(t<n} RA{t> n} 


The last expression equals f, since f, is constant on R. 


ll 


~ 


The application of Proposition 3-17 is this: Let (f,, #,) be an integer- 
valued martingale, and let S be a set of integers. Let the martingale 
almost surely have the property that it can be constant from some time 
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on only for values in S (and possibly for no values). We stop f, the 
first time it takes on a value in S. That is, we let t be the first time 
that f, is in S, and we introduce the stopped process f,. Ifthe values of 
f, are bounded from below or from above, then the “stopped process,” 
is almost sure to stop. The proof proceeds as follows. 

First, assume that f, > 0. Then (f,, Z,) is a non-negative martin- 
gale, by Proposition 3-17, which must converge a.e. to a finite value 
depending on w. Since by hypothesis these values must be in S for 
a.e. w, the process {f,} almost surely stops. Next, if {f,} is bounded 
below, apply the result for non-negative martingales to f, plus a 
suitable constant. Finally, if f, < c, apply the result for {f,} bounded 
below to {—f,}. 

These results are used in the next section in Examples 1 and 4. In 
Example 1 a fair game is stopped when it leaves a certain finite set, 
whereas in Example 4 it is stopped when a positive value is reached for 
the first time. By the above argument these random times are 
stopping times. 


Proposition 3-18: Let Z, C 2, C--- be an increasing sequence of 
partitions and let Z* be the smallest augmented Borel field containing 
the field |J 2,*. Let f be a random variable measurable with respect 
to #* and having finite mean, and set g, = M[f | 2,]. Then (gn, Zn) 
is a martingale, and 

lim g, =f 
n> 
almost everywhere. 


Proor: We may assume that f > 0 since the general case follows by 
considering f* and f~ separately. Then g, > 0 and M[|g,|] = 
M[f] < œ by conclusion (2) of Proposition 3-3. Since, in addition, 

M[gn+1 | Ba] = M[M[f | Rail | Ra] 
= M[f | 2,] by (1) of Proposition 3-3 
= Er 
we see that (g,,%,) is a non-negative martingale. By Corollary 
3-13, g = lim g, exists a.e. We shall prove that g = f a.e. 

First we prove that the g, are uniformly integrable. Let « > 0 be 
given. Choose, by Proposition 1-47, 6 > 0 small enough so that 
(EL) < ô implies fe fdu < e. Now 


Nul{g, 2 N}) < Mig,] = Mif] 


v 


or 


IV 


u({gn 
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Choose N large enough so that the right side is less than 5. Then for 
all n we have 


fdu < e. 


{8n =N} 
Since 


Endu = f fdu 
{Bn Z N} {8n 2N} 


by Lemma 3-6, we conclude the g, are uniformly integrable. 
Let E be any subset of #,*. Form > n, 


f Endu = f fdu. 
E E 


By uniform integrability and Proposition 1-52, 


lim | Ende = f gdp. 
m E E 


Í fdu = f gdp. 
E E 


for all Æ in |J &,*. The two sides of this last equation, considered as 
set functions, are equal on |) &,*. By the uniqueness half of Theorem 
1-19 they must be equal on Z*. That is, f and g are measurable with 


respect to Z* and satisfy 
f fae f gda. 
E E 


Taking Æ successively to be the set where f > g and the set where 
f < g and applying Corollary 1-40, we find that f = g a.e. 


Therefore 


5. Examples of convergent martingales 


Four examples will serve at present to illustrate Theorems 3-12 and 
3-15. Each of the first three refers to the correspondingly numbered 
example in Section 2. 


ExamPLe 1: The sequence {s,} of nth partial sums of the independent 
random variables y, forms a martingale if M[y,] = 0. Suppose the y, 
have identical distributions and mean zero, suppose they assume only 
the values 0, 1, and — 1, and suppose that the process is stopped when- 
ever s,(w) = Mors,(w) = —N. (Mean zero implies that the outcomes 
+1 and —1 are equally likely.) The player of the fair game is ruined 
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if s,(w) ever equals — N and he breaks the bank if s,(w) = M. Set 


p = Pr[player breaks the bank] 
and. 
q = Pr[player is eventually ruined]. 


By the remarks following Proposition 3-17, p+q= 1. In this 
situation, Corollary 3-16 applies and 
0 = Miso] = M[s,] = p-M + q(—N) 
= pM + (1 — p)(—N) 


or 
e N 
P“MiN 
and. 
- M 
1=HFN 


EXAMPLE 2: With the particle moving on the line, suppose there is an 
r > 0 which is a root of the equation f(s) = 1. We shall assume that 
r <1. Then {r*} is a non-negative martingale, and by Corollary 3-13, 
{rn} converges to a limiting function a.e. Since x, is integer-valued, 
this convergence means either 


(1) for almost all w, there isan N such thatifn > N,thenz, = xy, or 
(2) lim z, = +00. 


Now ty = %y41 = y2 =`": = Ly+, Means that the particle fails to 
move for k consecutive steps. Since such a thing happens with 
probability pọ” < 1, the probability that xz, = xy for all n > N is 
zero. Thus case 1 is eliminated, and we have established that lim x, = 
+œ a.e. That is, for any k and for almost all w, x,(w) = k for only 
finitely many n. 


EXAMPLE 3: When the f,’s are the difference quotients of a monotone 
function f, the pair (f,,#,) forms a non-negative martingale. By 
Corollary 3-13, f, converges to a limiting function at almost all points. 
The limiting function will turn out to be the derivative of f. However, 
our argument considered only nested partitions, and hence it provides 
only part of the proof of the existence of f’ a.e. 


Next we consider an example where Theorem 3-15 is not applicable. 
EXAMPLE 4: Suppose that a player plays the fair game of Example 1 


and that he stops the first time that he is ahead. The process is 
stopped when s,(w) = 1. We have already seen that this is a stopping 
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time. Then sọ = 0, and s, = l a.e. Hence M[so] = 0 # 1 = M{[s,]. 
Why is the theorem not applicable? Condition (1) is clearly satisfied. 


However, 
0= Jadu = | s,du + f S,dp. 


{t<n} {t>n} 


The first term equals the probability that 1 has been reached and tends 
to 1. Hence the second term tends to — 1, not to 0. Thus condition 
(2) is violated. 

In practice this gambling strategy cannot be implemented, since the 
gambler would need infinite capital to be able to absorb arbitrarily 
large losses. 


6. Law of Large Numbers 


The Strong Law of Large Numbers, which may be derived from the 
Martingale Convergence Theorem, is formulated as follows. 


Theorem 3-19: Let {y,} be a sequence of independent identically 
distributed random variables with finite mean a = M[y,]. If s, = 
yi +---+ y, and s* = s,/n, then 


Pr{lim s,* = a] = 1. 
n 


We shall prove the theorem for the special case where the random 
variables have finite range; say, Pr[y, = j] = p; for a finite number of 
js. For more generally applicable proofs the reader is referred to the 
bibliography. (See Feller [1957], pp. 244-245, for an elementary proof 
in the case y, is denumerable-valued; and see Doob [1953], pp. 334-342, 
for a general proof using the Martingale Convergence Theorem.) 

We introduce a useful tool, the generating function 


plt) = > pt. 
i 
It is a well-behaved function satisfying ~ = l and ọ'(1) =a. Let 


ky n) = 


for some £ > 0 to be specified. We shall show that {f(s,, n)} is a mar- 
tingale. The conditional mean of f(s,,,, + 1) givens, = kis 


>, py th me v 
TaT = f(k, n)-—— = f(k, n) = f (Sn, n). 


76 Martingales 


Since M[f(so, 0)] = 1 < œ, {f(s,, n)} is a martingale, and it is clearly 
non-negative. Thus, by the Martingale Convergence Theorem, 
F (Sn, n) converges to a finite limit a.e., where 


F (Sn: n) = Prol)" = [pt]. 


Fix e > 0, letb = a + «e, and form the function g(t) = ¢?/p(t). Since 
g(1) = landg’(1) = b — a > 0, we have g(t,) > 1 for some sufficiently 
small tọ greater than 1. If s,*(w) = b, we have 


lot /p(to)] = gto) > 1, 


and hence if s,*(w) > b for infinitely many n, then f(s,(w), n) has a 
subsequence tending to +0. By the convergence of f, we conclude 
that 

lim sup s,*(w) < b=a+t+e 


a.e. Similarly, by choosing a suitable £ < 1 we would find that 
lim inf s,*(w) > a — e 


a.e. Since e is arbitrary, s,* converges to a with probability one. 


7. Problems 


1. Show that if {f,} are denumerable-valued independent random variables, 
then 
Mif, +1 | fo AeA fa] = M(f,, +1]. 
Show also that 
Pr[fo E Áo A f EA, A---A £,EA,] = Prifp E Ao]----+Pr[f, E€ An]. 


2. Let {f,} be a sequence of denumerable-valued independent random 
variables and let {g,} be a sequence of Borel-measurable functions 
defined on the real line. Show that {9,(f,)} is a sequence of independent 
random variables. If the f, are identically distributed and all the g,’s 
are equal to g, show that the g(f,)’s are identically distributed. 


3. Verify that Examples 2 and 3 of Section 2 are martingales. 


4. Prove that if (f,,Z,) and (g,, #,) are martingales, then so is (fp + g,, #,). 
Does the same hold for (f,g,, Zn)? 


5. Prove that if (f,,2,) and (g,,#,) are non-negative supermartingales, 
then so is (min (f,, Bn), Bn). 


6. Prove that if (fa, %,) is a martingale, then (|f,|. Z,) is a submartingale. 
If the f, form a martingale on their cross partitions, do the |f,| form a 
submartingale on their own cross partitions? 


7. Let (fa, 2,) be a submartingale. Prove that, for any c > 0, 
c Pr pes f, > c] < Milfal]. 
isn 


[Hint: Take as stopping time the first time c is surpassed, or n, whichever 
comes first.] 
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8. Prove that every submartingale can be written as the sum of a martingale 
and an increasing submartingale. [Hint: If (x,,@,) is given, put 
fo = 0, fn = M[x, l Ry, 1] — Xn- Zn = fo +--+ + fn and yn = Xp Zy] 

9. Consider the following stochastic process: A white and a black ball are 
placed in an urn. One ball is drawn, and this ball is replaced by two of 
the same color. Let f, be the fraction of white balls after n experiments. 
(a) Prove that {f,} is a martingale. 

(b) Prove that it converges. 

(c) Prove that the limiting distribution has mean 4. 

(d) Prove that the probability of ever reaching a fraction ł of white 
balls is at most 4. [Hint: Use Problem 7.] 


10. We consider an experiment with each outcome one of two possible out- 
comes Hor T. We have two different hypotheses A and B as to how the 
underlying measure for a stochastic process should be assigned. For a 
given sequence HHT...H we denote by p,(HHT...H) the assignment 
under hypothesis A and by r,(HHT...H) the assignment under 
hypothesis B. Let f„ be defined by 


_ pa(HHT. ..H) 
f,(HHT...H) = r,(HHT...H) 


(a) Show that if the measure is defined by hypothesis B, then {f,} is a 
martingale and hence converges a.e. 

(b) Specialize to the case of tossing a biased coin. Let hypotheses A 
and B be that the probability of heads is, respectively, p and r. 
Show that if the measure is defined by hypothesis B, then {f,} 
converges to 0 a.e. if p 4 7. 


Problems 11 to 13 concern a type of stochastic process employed by psychol- 
ogists in learning theory. The state space consists of the rational points on 
the unit interval, and we are given two rational parameters, 0 < b <a < 1. 
From a point x the process moves to bx + (1 — a) with probability x, or to 
bx with probability 1 — x. It is started at an interior point zp. 


11. Show that if b = a, then {z,} is a martingale. 


12. Prove that the process converges either to 0 or to 1, and compute the 
probability of going to 1 as a function of the starting position zo. 


13. Show that if b < a, then {x,} is a supermartingale and the process 
converges to 0 a.e. 


Problems 14 to 18 concern the notion of conditional mean given a Borel 
field and show how it generalizes the notion of conditional mean given a 
partition. It will be necessary to know the Radon—-Nikodym Theorem to 
solve Problem 14. Let (Q, Ø, pu) be a probability space in which every 
subset of a set of measure zero is measurable. 


14. If f is a random variable such that M[|f|] < œ and if Y is a Borel field 
contained in Z, show that there exists a function M[f | Y] defined on Q 
satisfying 

(i) M[f | Z] is measurable with respect to 9’, the augmented field 
obtained from Y. 


(ii) If G is a set in Y, then f, fdu = f, MIE | ZJdp. 
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15. 


16. 


17. 


18. 
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Show that M[f | Y] is unique in this sense: Any two functions satisfying 
(i) and (ii) differ only on a set of ~-measure zero. We can therefore 
define any determination of M[f| 9] to be the conditional mean of f 
given Y. 


Show that if Y is the Borel field generated by a partition @, then 
M[f | 4] = M[f | Z] a.e. Show that if 9 = Z, then M[f | 4] = f a.e. 
State and prove a result for these conditional means in analogy with 
Proposition 3-3. [Hint: In (3), the condition that g be constant on cells 
of 2 should be replaced with the condition that g be measurable with 
respect to Y.] 


Generalize the definition of martingale in Definition 3-5, using these 
conditional means. Verify that the statements and proofs of Lemma 3-6, 
Proposition 3-7, Lemmas 3-9 and 3-10, Proposition 3-11, Theorem 3-12, 
and Corollary 3-13 apply with only minor changes to this generalized 
notion of martingale. Perform the same verification for Lemma 3-14, 
Theorem 3-15, Corollary 3-16, and Proposition 3-17. 


Prove the following generalization of Proposition 3-18: Let YJ C GY, C--- 
be an increasing sequence of Borel fields in Z, let be the least Borel 
field containing |] Y,, and let f be a random variable with M[|f|] < œ. 
Then (M[f | Y,], Zn) is a martingale, and 


lim M[f | Y,] = M[f | ¥] 


CHAPTER 4 


PROPERTIES OF MARKOV CHAINS 


1. Markov chains 


During all of our discussion of Markov chains, we shall wish to 
confine ourselves to stochastic processes defined on a sequence space. 
We have shown that an arbitrary stochastic process may be considered 
as a process on a suitable 2 in which the outcome functions f, are 
coordinate functions. We see, therefore, that in a sense no generality 
is lost by discussing Markov chains in terms of sequence space. 


Definition 4-1: Let (2, Z, p) be a sequence space with a denumerable 
stochastic process {x,} defined from 2 to a denumerable state space S 
of more than one element. The process is called a denumerable 
Markov process if 


Pr[%no1 = Cntr | Zo = Co Att A Enoi = Cpa A Ep = Cp] 
= Pr[%n41 = Ch41 | Ln = Cn] 
for any n and for any Co, ..., C,+1 such that 
Pr[%p = Co A+++ A Lp = Cy] > 0. 


The condition that defines a Markov process is known as the Markov 
property. Ifa denumerable Markov process has the property that for 
any m and n and for any c,, ¢,4, such that Pr[z, = c,] > 0 and 
Pr[z, = ¢,] > 0, 


Pr[@n+1 = Cn41 | En = Ca) = Prftm+i = Crta | Em = Cn] 
holds, then the process is called a denumerable Markov chain. The 


condition that defines a Markov chain is called the Markov chain 
property. 


All Markov chains that we shall discuss will be denumerable. From 
Proposition 2-8 we immediately have the following result. 
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Proposition 4-2: The measure on the space for a Markov chain is 
completely determined by 


(1) the starting probabilities, Pr[z) = 7], and 
(2) the one-step transition probabilities, the common value of 
Priz +1 = j | £n = i] for all n such that Pr[z, = i] > 0. 


If S is the set of states for a Markov chain, we customarily denote 
representative elements of S by i, j, k,..., and 0. For any Markov 
chain we define on the set S a row vector 7 and a square matrix P by 


am, = Priz = 2], 
Py; = Pr[z +1 =j | 2%, =i], where Prix, = i] > 0. 


The vector m is the starting distribution, and the matrix P is the 
transition matrix for the chain. They satisfy the properties 7 > 0, 
71 = l, and P > 0. If Pr[z, = i] = 0 for all n, then the ith row of 
P is not covered by the above definition, and we shall agree to take 
P,; = 0 for all j in this case. 

The definition of P implies that, for each i, (P1); = 1 or 0. It will 
be convenient, however, to think of Markov chains from a point of view 
which allows P to be any matrix with P > 0 and P1 < 1. To do sọ, 
we shall admit the possibility that some of the paths in the sequence 
space are of finite length. Intuitively a path of finite length is one 
along which the Markov chain can “disappear”; the process disappears 
from state i with probability 1 — (P1);. Mathematically paths of 
finite length can be introduced as follows: Suppose a Markov chain with 
state space S has a distinguished state 0 for which mọ = 0 and Poo = 1. 
We shall sometimes identify entry to state 0 with the act of disappearing 
in a process with state space S — {0} which also will be called a Markov 
chain. The transition matrix for the new Markov chain is the same as 
the original one except that the 0th row and column are omitted. Any 
path in the original process which has 0 as an outcome is now thought of 
as a path of finite length which terminates before the first occurrence of 
0. The original process can be recovered from the new process by 
re-introducing state 0 to the state space and by requiring that the 
transition probabilities to state 0 in the original process be the same as 
the probabilities of disappearing in the new process. With these con- 
ventions a Markov chain determines a vector 7 and a matrix P with 
m 2 0,71 = 1, P > 0, and P1 < 1. 

Conversely, if m is a row vector defined on S for which 7 > 0 and 
71 = 1 and if P is a square matrix defined on S for which P > 0 and 
P1 < 1, then ~v and P define a unique Markov chain with state space S 
by Theorem 2-4. If(P1); < 1, then the process has positive probability 


4-3 Examples of Markov chains 81 


equal to 1 — (P1); of disappearing each time it is in state j. Whenever 
convenient, the act of disappearing may be thought of as entry to an 
ideal state adjoined to S. 

Any state i of a Markov chain P for which P, = 1 is said to be an 
absorbing state. If outcome 1 occurs at some time, the process is said 
to enter the absorbing state and to become absorbed. It is easily seen 
that once the process has been absorbed, it is impossible for it to leave 
the absorbing state. 

If P is a Markov chain with starting distribution ~ and if q is a state- 
ment about the process, we denote the probability of q by Pr,[q]. If 


aa 1 when k =i 
k  \0 otherwise, 


we may alternatively write Pr[q]. Similarly if f is a random variable, 
we write M,[f] or M,[f], depending on the starting distribution. With 
this notational convention, we are free to discuss a whole class of 
Markov chains at once. The class contains all chains whose transition 
matrices are some fixed matrix P, and two chains of the class differ 
only in their starting distributions. Most of our treatment of Markov 
chains will be on this more natural level, where a matrix P, but no 
distribution 7, is specified. 

We conclude this section with a simple but useful proposition. Its 
proof is left to the reader. 


Proposition 4-3: If P is a Markov chain, then for n > 0, 
Priz, = j] = (P")ij 
and 
Pr,[z, = jl = (7P");. 
We shall use the notation P{? for (P"),;, the n-step probability from 
i to j. 


2. Examples of Markov chains 


We give ten examples of Markov chains; we shall refer to all of them 
from time to time. 


EXAMPLE 1: Weather in the Land of Oz. 

The Land of Oz is blessed by many things, but good weather is not 
one of them. They never have two nice days in a row. If they have a 
nice day, they are just as likely to have snow as rain the next day. 
If they have snow (or rain), they have an even chance of having the 
same the next day. If there is a change from snow or rain, only half of 
the time is this a change to a nice day. 
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The weather is conveniently represented as a Markov chain with the 
three states S = {Rain, Nice, Snow}. The transition matrix becomes 


RN S 
Be fe bed 
P=N |} 0 3 
Akg 


EXAMPLE 2: Chain with a set of states H made absorbing. 

Let P be an arbitrary Markov chain with S the set of states. Let a 
subset Æ of S be specified. We modify the original process by requiring 
that if the process is ever in a state j of Æ, it does not leave that state. 
The new process is also a Markov chain; its transition matrix P’ differs 
from the P-matrix in that Pi; = land Pj, = 0 for every j e E and for 
every i # j. The new process is called the chain with E made 
absorbing. 


ExAMPLE 3: Finite drunkard’s walk. 

A drunkard walks randomly on a street between his house and a 
lake, starting at a bar in the middle. He has some idea of which way 
is home. The steps along the way are labeled by the integers from 0 
to n; the bar, some integer i between 0 and n, is the starting state, and 
the drunkard moves one step toward home (state n) with probability p 
and one step toward the lake (state 0) with probability q = 1 — p. 
States 0 and n are absorbing. We assume that p 4 0 and p # 1. 
The transition matrix is 


0 1 2 8 n—l n 

0 1 0 0 0 0 0 

q 0p 0 0 0 

Ps, 2 0 q 0 p 0 0 
n-1\0 0 0 0 p 

n 0000 1 


The reader should verify that if p = 3, then {z,} is a martingale and 
that if p # 4, then {(q/p)**} is a martingale. 


EXAMPLE 4: Infinite drunkard’s walk. 

For this process, which is an extension of the one in Example 3, the 
states are the non-negative integers, and state 0 is absorbing. For 
each i > 0, we have 


Piisi =D Pii-1 = Gq and p+qe=l. 
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We assume p # Oandg # 0. The transition matrix is 


0 1 2 8 
0 71 0 0 0 
P= l1/{q Op 0 
2\0 q 0p 


If p = 4, then {x,} is a martingale, and if p # 4, then {(¢/p)™} is a 
martingale. 


EXxamPLe 5: Basic example. 

A sequence of tasks is to be performed in a certain order, each with 
its own probability of success. Success means that the process goes 
to the next state; failure means that the process must start over at state 
0. Thus the states are the non-negative integers, and with each 
positive integer + we associate two probabilities p; and q; such that 
Pi + qı = 1. The value p, is the transition probability from state 
i — 1 to state i, and q; is the transition probability from state i — 1 to 
0. Thus p; is the probability of succeeding in the ith task. We 
assume that p; < 1 for infinitely many 7, and we normally assume that 
pi > 0 for every i. The transition matrix is 


0 1 2 3 

0 a P 0 0 

P= 1[q 0 P O0 
2 \ g 0 O pg 


In connection with this example, we define a row vector B by 


Bo = 1 
Bi ist Pr 


Then £, is the probability of i successes in a row after the process starts 
at 0. The reader should verify that a necessary and sufficient condition 
for B = BP is that lim;.... 8; = 0. This Markov chain will be referred 
to hereafter as “the basic example.” 


EXAMPLE 6: Sums of independent random variables. 
The states of a Markov chain P are the elements of an index set J on 
which an operation of addition is defined in such a way that I becomes 
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an abelian group. A probability distribution {p,} defined on J satisfies 
Pı = Oand > p, = 1. The Markov chain P is defined by p; = p,_;. 
The name of this Markov chain is derived from thinking of per- 
forming independent experiments which have probability p; of outcome 
i. The states of the chain are the partial sums of these results, and 
the sum changes from i to j with probability P,j., = Prif j =i +k. 
For the case in which the index set I is the set of integers with the 
usual concept of addition defined on them, martingales arise as in 
Example 1 of Section 3-2. We shall apply these ideas in Chapter 5. 


EXAMPLE 7: Two classes of random walks. 

We shall be concerned especially with two kinds of random walks. 
The symmetric random walk in n-dimensions is defined to be a sums of 
independent random variables process on the lattice of integer points 
in n-dimensional Euclidean space. The transition probability from 
one lattice point to another is (2n)~1 if the two points are a Euclidean 
distance of one unit apart; the transition probability is zero otherwise. 
Thus, from each point the process moves to one of 2n neighboring 
points with probability (2n)7?. 

A second kind of random walk with which we shall be concerned is a 
sums of independent random variables process on the integers with 
Pia. = p and P,;,_, = q for every i. We shall call this process the 
p-q random walk. If p = q = 4, then {x,} is a martingale, and if 
p # 4, then {(q/p)*"} is a martingale. 


EXAMPLE 8: General random walks on the line. 

The state space for a random walk on the line is the set of integers, 
and for each integer i, three probabilities p,, qi, and r; with p; + qi + 
r; = l are specified. A Markov chain is defined by 


Piiri = Di 
Pii-1 = 4 
Pi = Tie 


The drunkard’s walk and the p-q random walk are both special cases. 

An important case of random walks on the line which we have not 
discussed yet is the reflecting random walk. For this chain the process 
is started at a state which is a non-negative integer, and the assumption 
is made that qọ = 0. The process never reaches the negative integers, 
and the state space may just as well be taken as {0, 1, 2,...}. 


EXAMPLE 9: Branching process. 
The state space is the set of non-negative integers, and a fixed 
probability distribution p = {po, P1» P2,...} is specified. Suppose the 
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mean > kp, of pis m. Let {y,} be a sequence of non-negative integer- 
valued independent random variables with common distribution p, and 
set S, = Ya +-::+ Yn Let p{” = Prfs, = j]. Then the branching 
process is defined to be a Markov chain with transition probabilities 
Py = ps. 

The usual model is the following. A species of bacteria has the 
distribution p representative of the number of offspring one such 
bacterium has before it dies. The value of y, represents the number of 
offspring the kth bacterium has while it is alive, and s, represents the 
total number of bacteria produced by n bacteria in one generation of 
the colony. The rth position, z,, of the stochastic process is the 
number of bacteria in the rth offspring generation. 

As we have noted, the branching process is a Markov chain. Let 
{x,} be the outcome functions for the chain started in state 1 (that is, 
with one bacterium in the colony initially), and suppose the mean m is 
finite. Then {z,/m"} with its natural partition forms a martingale. 
The reader should verify that M[|z,/m"|] is finite; we shall show that 
M[xn41/m"*1 | x,/m A-A x,/m"] = x,/m". First we note that 


M(s,] = >, Mly,] = nm, 
so that if we know that the process is in state r, then the mean state 
that it is in after the next step is rm. Then 
M[aqai/m"™*+ | xm A- -A alm] 
= M[x,,,/m"*! | x,/m"] by the Markov property 
= M[x,41/m"** | æn] 
= (1/m"*1)M[an 41 | Ln] 
(1/m"*1)z,m by the remarks above 


= 2£,/m". 


EXAMPLE 10: Tree process. 

Let {x,} be a denumerable stochastic process defined on sequence 
space, and let S be the set of states. Define a set T' to be the set of all 
finite sequences of elements in S. Define a new stochastic process as 
follows: If t and u are elements of T for which 


t = (Co, Cy, Cg, ++ +5 Cy) 
and 
UW = (Cos Cy, Coy +++ Cas Cn aids 


define 


Pr[Ynea = U| Yn = t] = Pritnsr = Cntr | %o = Co Atti A En = Cal- 
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The process {y,} defined from the same space to the set T is a Markov 
chain; the entire history of the original process up to time n is con- 
tained in the knowledge of the value of the nth outcome function y, for 
the new process. 

An example of a tree process is obtained by considering anindividual’s 
voting history in successive years. Letting D and R represent the 
political parties, we see that his possible histories can be conveniently 
represented as a tree: 

RR 


T RDR 
DTT 
a SS 


RDD 
DR 
ee 
pp 


The chain is in each state—D, R, DD, DRR, etc.—at most once. 


3. Applications of martingale ideas 


Let P be the transition matrix for a Markov chain. A column vector 
f is said to be a P-regular function, or simply a regular function, if 
f = Pf. The function is superregular if f > Pf; it is subregular if 
f< Pf. 

The reader should convince himself that the regularity of a function 
is a condition of the following form: At each point of the domain, the 
value of the function is equal to the average value of the function at 
neighboring points. By neighboring points we mean those states that 
it is possible for the process to reach in one step, and by average value 
we mean the average obtained by weighting the function values at 
neighboring points by the transition probabilities to those states. A 
function f is said to be regular at a point j if f; = (Pf); 

Regular measures may be defined analogously with regular functions. 
A non-negative row vector m is a regular measure if 7 = 7P; it is 
superregular if 7 > 7P and subregular if r < rP. 

Let h be a P-regular function and let A(x,) denote h; if x, = j. 
Suppose M[|h(z,)|] is finite. We shall show that (h(z,),Z,) is a 
martingale, where &, is the cross partition Z% Q #1 Q- -Q Hrn 
determined by the outcome functions x, £1, ..., £, for the Markov 
chain. It is sufficient to show that 


M[h(%n +1) | Zo AtA Ly] = h(x,). 
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On the cell of the cross-partition where x) = i,..., 2, = j and where 
Pr[%y =i A-A £, = Ĵ] > 9, 


M[A(xn+1) | zo AeA Zn] T M[h(tn41) | £o = UA A En = j] 


= > Pre = k|zo =i AtA a, =Jjlhy 
k 

= > Ph, by the Markov chain property 
k 


= h; since h is regular 
= h(z,). 


Thus (h(x,), #,) is a martingale. Similarly, superregular functions are 
associated naturally with supermartingales and subregular functions 
correspond to submartingales. The proofs differ from the above proof 
only by insertion of the appropriate inequality sign in the next to the 
last step. 

Most of our applications of martingale ideas we shall leave to the 
next few chapters. We shall, however, settle some things about 
branching processes at this time. Let {x,} be the outcome functions 
for a branching process started in state 1, and suppose the mean 
m = > kp, is finite. As we noted in Section 2, {x,/m"} forms a non- 
negative martingale, which by Corollary 3-13 converges almost every- 
where to a finite limiting function g. One can show that g is not a 
constant function; that is, the value of the limit of {x,(w)/m"} very 
much depends upon the early history of the path. The exact distri- 
bution of g, however, is an unsolved problem. 

On the other hand, information about whether the process dies out 
(by being absorbed at state 0) is not hard to obtain. Let g(s) = 
>; p; and suppose r = g(r), r = 0, andr # 1. Then {r**} is a non- 
negative martingale. First suppose r > l. Since {r"»} is a non- 
negative martingale, it converges to a finite limit almost everywhere, 
and since r > 1, z, itself converges with probability one. Since x, is 
integer-valued, x, is constant on almost all paths from some point on. 
It is left as an exercise to show that the constant must be zero and that 
the process therefore dies out with probability one. Next suppose 
r <1. Then {r%} is bounded and converges almost everywhere to a 
limiting function r*», which must be 0 or 1 (that is, x. = œ or 0) 
almost everywhere. By dominated convergence we have 


r = M[r%o] = M[r*<] = 1-Pr[process dies out] 
+ 0-Pr[process does not die out], 


so that r is the probability that the process dies out in the long run. 
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Finally, suppose r = 1 is the only non-negative root of s = g(s). 
Then m = 1 and {z,} is a non-negative martingale. Once again we 
must have x, —> 0 with probability one, and the process is almost 
certain to die out. 

The reader should notice that the case r = 1 has the property that 
M[x,] = 1 for all n, whereas M[lim x,] = 0. The process in this case is 
an example of a fair game whose final expected fortune is strictly less 
than the starting fortune. 


4. Strong Markov property 


The strong Markov property is a rigorous formulation of the following 
assertion about a Markov chain: If the present is known, then the 
probability of any statement depending on the future is independent 
of what additional information about the past is known. In this 
section we shall state and prove this result; our procedure will be first 
to prove a conceptually simpler special case and then to obtain the 
general theorem as an easy consequence. In the special case the time 
of the present will be a fixed time n, whereas in the general case the time 
of the present will be allowed to depend on the past history of the 
process. That is, the time of the present will be a random time. 
Knowledge of the present, then, means knowledge of the outcome at the 
time of the present. 

If w = (co, €1, C2, < - +3 Cn-1s Cn» Cn 41,+--) iS a point in a sequence 
space, we agree to call the path 


(Cn; Cn+is-: -) 
by the name w,. 


Lemma 4-4: Let {p,} be a sequence of statements whose truth sets 
are disjoint in pairs, let V p, be their disjunction, and suppose 
Pr[V p,] > 0. Ifp is a statement for which Pr[p | p,] = c whenever 
Pr[p,] > 0, then Pr[p | V p,] = c- 


ProorF: For each k, 
c-Pr{p,] = Prip A px]. 
Thus 
c > Prip] = > Prip A pil. 
k k 


Since the p, are disjoint statements, it follows from complete additivity 


that > Pr[p,] = Pr[V px] and that > Pr[p A Pk] = Prip A (V px)]. 
Thus c = Pr[p ^ (V p,)]/Pr[V p,] and the lemma follows. 
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Throughout the remainder of this section let (Q, Z, p) be a fixed 
sequence space and {z,} a fixed denumerable Markov chain defined on 
2. The field of cylinder sets will be denoted by F, as in Section 2-1, 
and the smallest Borel field containing F will be called Y. 


Definition 4-5: The tail-field 7, is the smallest augmented Borel 
field containing all truth sets of statements x, = Cp A-:*A Zm = Cm 
m>n. [Thus7, = Zand I, C 7J,_;.] 


A statement relative to the field F „ defined in Section 2-1 is one whose 
truth set depends only on outcomes £o, ..., Zn, whereas a statement 
relative to 7, is one whose truth set does not depend on outcomes 
Xo,-++,X_-1. Specifically, a set Rin Z isin 7, if and only if, whenever 
w E Rand w’ is such that w, = w,, then w'e R. 

We note that the class of sets 7, ON F, being the intersection of 
fields, is again a field. Moreover, 7,, is the smallest augmented Borel 
field containing 7,, N F, so that the uniqueness statement of Theorem 
1-19 applies: A probability measure on 7, is completely determined by 
its values on J, 0 F. 


Lemma 4-6: Let {x,} be a Markov chain with starting distribution 7, 
let q be a statement relative to F,,_,, let r be a statement relative to 
TF n+, O F, and suppose Pr,[g A x, =i] > 0. Then 


Prr |g A £n = i] = Pr,[r | 2, = i] = Prr’], 
where 7’ is so chosen that w e R if and only if w, € R’. 
Remark: Such an r’ exists (and is unique), since r is a statement 
relative to 7,4, OF. 
PROOF: 


Case 1: ris of theformz,,, =j. Write q as a disjunction q = V qm, 
where 
dm: Zo = CGP Atit A En- = CMP). 


For each m such that Pr,[¢, A £n = i] > 0, 
Pral | dm A En = i] = Proltns1 = J | dm A em = i) = Pi; 
by Definition 4-1. Hence, since Pr,[q A 2, = i] > 0, 
Prt |q Ae, = i] = Py = Pr, lr | ee = i] 
by Lemma 4-4. Taking r’ as x, = j, we have 


P; = Prix, = j] = Prir’]. 
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Case 2: r is of the form 2,44 = Cn41 A'A 2m = Cm, m>n. We 
have 


Pr,[r|q A £, = 4] 
= Pri[%i1 = Cn41 | YA Tt, = 1] 
x Praltnse = Cnt2| I A En = tA Saar = Cae] 
x +++ x Priam = Cm |Q A t= t A+++ A Em1 = Cm-1): 


The general factor on the right is 


Prylinse+1 = Cntet1 | QA iy = TAA Engr = Cntr) 


First, suppose that none of these factors is zero. Then we may apply 
Case 1 with n + k in place of n and q A tp SiN Bagp-1 = 
Cn+k-1in place ofg. The q’s drop out of the conditions, and the product 
of the new conditional probabilities is Pr,[r | x, = îi]. 

Next, suppose that at least one of the factors is zero; let the first 
such factor from the left be 


Priltnseti1 = Cnteti | q AEn SNN Engk = Creel: 
We must show that Pr,[r | z, = i] = 0. If k = 0, then by Case 1 
0 = Pralni = n41 | q A 8n = i] = Pratat = Cnt | Ly = 1), 
and hence Pr,[r | x, = îi] = 0. Ifk > 0, then 
Prilg A t= A+++ A Saan-1 = Cne-i] > 9, 


and Case 1 gives 


O = Pryltnseer = Cni (IA tn = tN A Caan = Crer] 
= Priltasee1 = Cn+ee1| tn STA A Laake = Cral: 
Hence Pr,[r | x, = i] = 0. 
Finally r’ is the statement 7, = ¢,4; A'A Zm-n = Cm, and, since 
Pr,[z, = i] = Pr,[¢ A x, = i] > 0, we have 
Priles a = Prea er ee e Pensia = Polir]: 


Case 8: r is arbitrary in J p41 O F. A general statement r reduces 
to the denumerable union of the type statements in Case 2, and the 
result follows from the complete additivity of the probability measure. 


The lemma to follow is the strong Markov property for the case in 
which the time of the present is a fixed time n. 


4-7 Strong Markov property 91 


Lemma 4-7: Let {x,} be a Markov chain with starting distribution 7, 
let q be a statement relative to F,, let r be a statement relative to 7,, 
and suppose Pr,[¢ A x, = i] > 9. Then 


Pr,[r|q A x, = 4] = Pr,{r| 2, = i] = Pr{r’], 
where w € R if and only if w, € R’. 


Proor: Write 
q = V (Zo = CEP A+++ A n-i = OM, A ty = CM). 
m 
If we set g* = Zo = CM A-A 2,1 = c™,), where the dis- 
q m (Xo 0 n-1 n=1 
junction is taken over just those m such that c™ = i, then 
(q* A En = 1) = (J A & = 4) 


and q* is a statement relative to F,_,. In the special case where r is 
relative to 7,, O F, we may write 


r= V (£n = CM A+++ A ty = &™ (N fixed) 
m 
and 


r* = V (n41 = Cn-1 Ave A ty = CY) 
m 


with the second disjunction taken over only those m such that c™ = i. 
Then 
Pr,[r | g A % = i] = Pr,{r* | q* A 2, = îi], 


Pr,{r | 2, = i] = Pr,[r* | x, = i], 


and 
Prr] = Prr*’]. 
By Lemma 4-6, 
Prlr® |." A £, = i] = Pr,[r* | xn = i] = Pr[r*]. 
Hence 


Prj{r|q A 2, = îi] = Pry[r | a, = i] = Prir’]. 


We have thus established the lemma for every r measurable with 
respect to 7, N F. But 7, is the smallest augmented Borel field 
containing 7, N F, and by Theorem 1-19 any two measures on 7, 
which agree on 7, N F must agree on all of 7,. Thus 


Prali |g A x, =i], Pr,{r|a, =i], and Prr], 
which define such measures as r varies, are equal for every r measurable 
with respect to F „. 


Turning to the general case of the strong Markov property, let t be a 
random time. We define w, pointwise to be w, at all points where 
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t(w) =n. We do not define w, if t(w) = œ. Similarly the outcome 
function x, is defined to be x,(w) if t(w) = n, and it is not defined for 
t(w) = œ. 


Definition 4-8: The field F, is the Borel field of all sets A such that, 
for each n, ANA {w | t(w) = n} is in F,. The tail-field 7, is the 
smallest augmented Borel field containing all truth sets of statements 


X= ON +A Lean = Chris k > 0. 


A statement q relative to F, is one such that, for each n, the state- 
ment q A t = n depends only on outcomes 2,...,Z,. A statement r 
relative to J, is one whose truth set does not depend on outcomes 
before time t. Specifically, a set Rin Z is in 7, if and only if whenever 
w € Rand w' is such that w; = w, then w'e R. 

We state the strong Markov property as the next theorem. 


Theorem 4-9: Let {x,} be a Markov chain with starting distribution 
a, let t be a random time, let g be a statement relative to F, let r be a 
statement relative to 7, and suppose Pr [g A x, = i] > 0. Then 


Pri[r|q A (Œ = 1)] = Prr | x; = i] = Prir’], 
where w € R if and only if wE R’. 


Proor: We shall prove the theorem for any statement r measurable 
with respect to. 7,04. The theorem for general r will then follow, 
as in the proof of Lemma 4-7, from the uniqueness half of Theorem 1-19. 
Since x, 4 1 when t = œ, we have 


(QqQAt%=)=V GAa, =tAt =n). 
n=0 
We are going to apply Lemma 4-4 with p the statement r, with p, the 
statement q A x, = it A t = n, and with c the constant Prjr’]. Todo 
so, we must show that 


Pri[r|q At, =i A t= n] = Prir'] 
whenever Pr [g A x, = i A t = n] > 0, and we will have proved that 
Pr,{r |q A x; = i] = Prfr’]. 


The fact that Pr,[r | x, = i] equals both of these quantities will follow 
by taking q to be a tautology. 

Thus we first note that q A t = n is measurable with respect to F,. 
In addition there exists a statement f measurable with respect to 7, 
such that 

(rAt=n)=(FAt=n); 


4-10 Systems theorems for Markov chains 93 
this is so because r is the denumerable union of statements 
X= ONAN Lean = Caen, 
and we may take # to be the same union over the statements 
En = O Nti N Cyan = Cine 
In this notation the statement r’ is the union of the statements 
Zo = G AtA By = Can, 


and we have that w is in the truth set of 7 if and only if w, is in the 
truth set of r’. Hence 


Prr] garp =tAt=n)=Priratan|qnau=tant=n] 
= Prf a t=n|qAtn=int 
= Pr,[7 | (q A t =n) A a, = îi] 
= Pr [f | x = 1] 


= Pri{r’], 


n] 


the last two equalities following from Lemma 4-7. 


An equivalent way of stating the first equality of the conclusion of 
the preceding theorem is 


Prig A r| a, = i] = Pr,{q| a, = i] Pr,{r |x, = i]. 


This is the form in which the theorem asserts that if the present is 
known, then the past and future are independent. 


5. Systems theorems for Markov chains 


As immediate consequences of the strong Markov property, we can 
prove two systems theorems for Markov chains. The first states that 
if p is a statement depending on outcomes only beyond some random 
time t, then one may compute Pr,[p] as if the chain were started with 
the initial distribution Pr,[z, = j]. 


Theorem 4-10: Let {x,} be a Markov chain, and let p be a measurable 
statement with truth set P satisfying 

(1) Pr,[p A (t = œ)] = 0, and 

(2) there exists a statement p’ with truth set P’ such that if t(w) < 

+œ, then w e P if and only if w, € P’. 
Then 
Pr [p] = > Pr,[z;, = k] Prilp'). 
kes 
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Proor: By (1) we have 
Pr,[p] = > Pr,[we P A x, = k] 
k 


= > Proe P’ A x = k] 
k 

= > Prae = k] Pro € P’ | x, = k] 
k 


= > Prix, = k] Pr,[p’] by Theorem 4-9. 
k 


Theorem 4-10, which is a result about probabilities of statements, can 
also be thought of as a result about means of characteristic functions. 
Then Theorem 4-11 to follow becomes a straightforward generalization 
to arbitrary functions. 


Theorem 4-11: Let {x,} be a Markov chain, and let f be a random 
variable satisfying 


(1) Pr,[f 4 0 A t = œ] = 0, and 
(2) there exists a random variable f’ such that if t(w) < œ, then 
flw) = f(a). 
Then 
Mif] = 2, Prola, = k] Milf’) 


Proor: If f assumes negative values, we may prove the result for f + 
and f- separately. We therefore assume f > 0. Let p;™ be the 
statement j/2" < f < (j + 1)/2™, for 1 < j < m-2, let p o™ be the 
statement 0 < f < 1/2”, and let p\%}n be the statement m < f. Define 
statements p,“” similarly for f’. Then p;™ and pp,” satisfy the 
hypotheses of Theorem 4-10, so that 


Pr,[p;] = > Prie, = k] Prep; ™]. 
k 


Hence 


M,{f] = li 


4 


a (m) 
> aa Pras] 
m2™ j 
= lim > Pr ralz, = k] 2 Z Pr ps] 
k 
m2™ 


=> Pr,[x, = k] lim > i Pr, ,[p;™] by monotone convergence 
j=0 


=> Priz, = k] M,[f’]. 


k 


xw 
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6. Applications of systems theorems 


The theorems of Section 5 will play an important role in our study of 
denumerable Markov chains. At this time we shall not illustrate the 
full power of the theorems but shall be content instead to use them in 
developing some of the machinery needed for the classification of states 
in Section 7. 

We begin by introducing some notation. Define 


ack e ifi=j 
4" \0 otherwise. 

Let h, be the statement about a Markov chain that state j is eventually 
reached. We have already defined the random variables n, and t, for 
general stochastic processes (see Section 2-6); n, is the number of times 
in state j, and t; is the time to reach state j. Let f\” be the statement 
that t,(w) = k. 

Confining ourselves to Markov chains, we associate the quantities 
h,, ii, t;, and f with h,, n, t;, and f. They are defined as follows: 


h;: h, is true for w, 
i,(w) = n (w) 
Tw) = tiw) + 1 
fe: t (w) = k. 
In terms of these quantities, we define a collection of matrices. We 


note that, in general, an expression of the form {M,[g,]} stands for a 
matrix. 


H; = Pr{h;] 
Ni; = M{n,] 
FẸ = Pris? 
A, = Pr{hj] 
Ni; = Min; 
Fw = Pri Je 


It is trivial to verify that H; = 1, that F® = I, that FY = P — Py, 
and that N =I +N. 


Proposition 4-12: If P is a Markov chain, then 
N=) P. 


Proor: The result follows immediately from Propositions 2-10 and 4-3. 
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Proposition 4-13: If P is a Markov chain, then N = PN and 
N=I1+ PN. 


Proor: The second identity follows from the first by adding J to 
both sides. To obtain the first one, we apply Theorem 4-11 with 
f =n, and the random time identically one. Then f’ = n; since 
i,(w) = n,(w,) by definition, and thus 


M,{a;] = > Pr{z, = k] M,{n,] 
= 2 Py, M;ln;] 


or E 
N = PN. 
Proposition 4-14: If P is a Markov chain, then 


H = Fe, H = 2 FY, and HA = PH. 


K=0 

Proor: The first two assertions follow from the complete additivity 
of u; we have h, = V fP and h, = V fP disjointly. For the third 
assertion we apply Theorem 4-10 with p = h; and the random time 
identically one. Then p’ is the statement h; and 


H, = Prip] = 2 Prix, = k] Prip] 


= > P iH kit 
k 
Proposition 4-15: If P is a Markov chain, then 
Ny = HN; 
N i 7 H,,N ji 


Ny = 1+ A, Nu. 


Proor: The third assertion follows from the second and the identity 
N = I + Nwithj =i. For the first assertion we apply Theorem 4-11 
with f = n; and the random time equal to t,. It is clear that n;(w) = 
n,(w,,) if t;(w) < œ. Therefore f’ = n;, and 


N; = M{n;) = > Prix, = k] M,{nj] 
k 


a Prlz, = j] Mj{nj] 
Prt; < œ] Mj[n,] 
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Similarly, for the second assertion we apply Theorem 4-11 with f = ñ;, 
f’ = n, and the random time equal to t. By the same kind of 
argument, we find 

M{n,] = H; M,{n,). 


Proposition 4-16: Let p be the statement that a Markov chain reaches 
state j and then state k with j # k. Then Pr[p] = H,,H;,. 


Proor: In the notation of Theorem 4-10, if t, is taken as the random 
time, then p’ is the statement h,. The theorem applies and 


Pr{p] = > Prix, = m] Pralp’] 


= Prix, = j] Prii] 
= HH y 


In our discussion of Markov chains, we shall make frequent use of 
the following notational devices. Let k and j be states of a Markov 
chain. By “n,(w) is meant the number of times on the path w that 
the process is in state j before (and not including) the first time 
that the process is in k. We define “ni, as the number of visits to j 
before the process reaches k after time 0. Notice that in;(w) = 0, but 
in,(w) is 1 if w starts with j. For fixed k we introduce the corresponding 
matrices *N and *N by 

"Ni; = M{"n,(w)], 

"Ni; = M'n,(w)]. 
They are related as follows: 
l "Ni = 8 + Mi{*n,(~,)]. 
We further define *H,,; to be the probability of hitting j before k, 
having started in 7; "H,, is the probability of hitting j before hitting k 
after time 0, having started in 7. 

We will later want a more general notation than “n,(w). By 
En (w) we shall mean the number of times on the path w that the 
process is in j before it is in any state of the set E. It is sometimes 
convenient, in this connection, to think in terms of the modified chain 
in which the states of E have been made absorbing. Again we have 
matrices #N, and we also introduce the matrices =H and *H analogously. 

If E is a subset of the set of states S for which neither E nor Ñ is 
empty, we shall decompose the P matrix into 


E É 


E/T U 
P= 
ln o 
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according to the method discussed at the end of Section 1-1. If A 
is an arbitrary matrix indexed by the set S, we write A, for the restric- 


tion of A to a matrix indexed only by E. As an example, we note 


7. Classification of states 


We introduce a partial ordering on the states S of a Markov chain P. 
Two states 7 and j are said to be R-related, written R(i, 7) if H; > 0, 
that is, if it is possible to reach j from i. If R(ż, j) and R(j, i), we say 
that i and j communicate and write i ~ j. To see that R is a partial 
ordering, we note that 


(1) Hy = 1 > 0 so that R(i, i). 
(2) If R(t,7) and R(j, k), then R(i, k) because Hy, > HijH;, > 0 
by Proposition 4-16. 


The reader should verify that ~ is an equivalence relation. 

The relation ~ therefore partitions the states of S into equivalence 
classes within the ordering, and movement from state to state is within 
a class or upward through the ordering. We do not assert the existence 
of maximal classes; we shall see an example later where no maximal 
classes are present. (The reader should then be able to exhibit an 
example of a chain having no minimal classes.) 


MAXIMAL CLASSES 


MINIMAL CLASSES 


FLOW IN A MARKOV CHAIN 


Proposition 4-17: States i and j are R-related if and only if there 
exists an n > 0 for which (P"),; > 0. 


ProoF: Suppose n > 0 is the smallest exponent for which (P*),; > 0. 
Then FP = (P”); > 0, and since H; = >, F, we have R(i, j). Con- 
versely, if R(i, j), then H,; > 0 and it must be true that F > 0 for 
some n. Thus (P"),; > 0. 
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Definition 4-18: A state i is said to be recurrent if H, = 1; it is said to 
be transient if H; < 1. 


The lemma to follow contains some identities connecting H and H 
which will be used in the next few propositions. The reader should 
study these examples of the use of the strong Markov property in order 
to develop his intuition. 


Lemma 4-19: The following statements hold: 


(1) The probability starting in i of returning to state i at least k 
times is (H,,)*. (Use the convention 0° = 1.) 

(2) The probability starting in + of returning at least k times to i 
before hitting j is ('H,,)*, provided i # j. 

(3) The probability starting in i of returning to i via j is ‘H,,H,,, 
provided ¿i # j. 

(4) The probability starting in i of reaching j for the first time after 
n returns to i is ((Ħ,)" ‘H,;, provided i # j. 

(5) The probability starting in 7 of being in state j at least n times 


is H,,(H,,;)"~>. 


ProorF: The proofs are all by Theorem 4-10. 

(1) Use induction on k. For k = 0 the result is trivial; assume that 
it holds for k — 1. Let p be the statement that the process returns to 7 
at least k times, and let t = t;. Then p’ is the statement that the 
process returns to 7 at least k — 1 times. 


Prip] = >, Prlz, = j] Prip] 
J 
= Priz, = i] Pr[p'] 
= Prt; < 0] Pr[p'] 
= H,,(H,)*-! by inductive hypothesis. 
(2) If i # j, the result is the same as (1) for the chain in which the 
single state j has been made absorbing (see Example 2, Section 4-2). 
(3) Let p be the statement that the process returns to 7 via j, and let 
t be the time that j is reached if j is reached before a return to i, or +00 


if j is not reached before i. Then p’ is the statement that 7 is reached, 
and 


Pri [p] = > Pri (a, = k] Pr, [p'] 
k 
= Pr; [z, = j] H; 
= Pr, [t < œ] H; 
HH}. 
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(4) The argument is the same as in (3). Use the systems theorem 
with t equal to the time of the nth return to 7 if the return occurs before 
j is reached, or +00 otherwise. 

(5) The proof is by induction on n and is the same as in (1) except 
that the random time becomes t = t,. 


Proposition 4-20: State 7 is transient if and only if Na < +00. Then 
Ny = 1/1 — Ha). 


PROOF: 
Na = > k Pr{n, = k], 
k=1 


which upon rearrangement of terms becomes 


S > Prjn; = m], 


k=1m=k 


which by complete additivity is 


> Prin, > k] 
k=1 


œ 


= > (H;;)*-! by conclusion (1) of Lemma 4-19. 


k=1 


The right side is finite if and only if H, < 1. 


Corollary 4-21: If j is a transient state, then N, < œ for all states i 
in the chain, and N; = H N; 


ProoF: From Proposition 4-15 we have 
Ni; = Ha Na < Nj 


J 


The result now follows from Proposition 4-20. 


We are now in a position to put together the ideas of recurrence and 
transience with the partial ordering R and the equivalence relation ~. 
We need two lemmas before we can prove our fundamental result— 
that all states in an equivalence class are of the same type, recurrent or 
transient. 


Lemma 4-22: If i + j and R(i, j), then iH, < land ’N,, < +0. 


Proor: Suppose ‘H,, = 1. By conclusion (2) of Lemma 4-19 the 
probability of returning n times before hitting j is (H) = 1. Hence, 
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by Proposition 2-6, there is probability one for returning infinitely often 
before hitting j, in contradiction to the relation R(t, j). For the second 
half of the lemma, we have, as in Proposition 4-20, 

IN; = > CHa)" < +00. 


n=0 


Lemma 4-23: If ¿ is recurrent and R(i,7), then H; = 1 and H; = 1 


Proor: The result is obvious if i = j. Ifi # j, consider returning to 
i with and without first reaching j. By conclusion (3) of Lemma 4-19, 


1 = Ay = 'HyH,, + Hy 


Since ‘H,,; + Ha < 1, this equation is a contradiction unless ’H;, = 1 
or H; = i, The first alternative is ruled out by Lemma 4-22. Thus 
H,, = 1 and ‘H,, = 1 — iHa. Next, since i is recurrent, one may 
compute H, by summing the probabilities of reaching j for the first 


time after n returns to i, where n = 0,1, 2,.... By conclusion (4) of 
Lemma 4-19, 
Hi; = > CHa" ‘Ai; (i= iHa) 2 CHa" =1 
n=0 n=0 


The last equality holds, since iH; < 1 by Lemma 4-22. 


Proposition 4-24: All states in an equivalence class are of the same 
type, recurrent or transient. 


Proor: It is sufficient to show that if one state in an equivalence 
class is recurrent, so are all others. Let i be a recurrent state, and 
suppose j ~ i, j #7. Then H, > H, „Hp, since the probability of 
returning is at least as great as the probability of returning via 7. 
(We have used an argument familiar from Proposition 4-16 to compute 
the latter.) Hence H,; = 1 by Lemma 4-23. 


Corollary 4-25: If i is recurrent and i ~ j, then Hy = Ha = 1. 
Proor: The corollary follows from Lemma 4-23. 


Because of Proposition 4-24 we are free to speak of transient and 
recurrent classes of states. We shall mention a few simple results 
about classes of states. By a closed class we mean one that it is 
impossible to leave. A process cannot disappear when it is in a closed 
class. 


102 Properties of Markov chains 


Proposition 4-26: Recurrent classes are closed and maximal with 
respect to the partial ordering R. 


Proor: It is sufficient to prove that a recurrent class S’ is closed, 
since closed classes are clearly maximal. Suppose the class can be left, 
say from a state je S’. If kis a state outside S for which P;, > 0, 
then it is not true that R(k, j) because j and k do not communicate. 
Thus H,;, < 1 — Py, < 1, and j is not recurrent. 


Proposition 4-27: If a Markov chain is started in a recurrent class S’, 
then the chain is in every state of 8’ infinitely often with probability 
one. In particular, if 7 and j are in S’, then Nj; = +00. 


Proor: Suppose the chain is started in state i. Then, by conclusion 
(5) of Lemma 4-19, the probability of being in state j at least n times is 
H, (H) = 1. By Proposition 2-6 the chain is in state j infinitely 
often with probability one. Again by Proposition 2-6 it is in every 
state infinitely often with probability one. 


Proposition 4-28: A Markov chain is in a finite subset of transient 
states only finitely often, with probability one. 


ProoF: If the chain were in a finite set S’ infinitely often with positive 
probability, it would be in one state j of S’ infinitely often with positive 
probability. Such an occurrence would imply that N, is infinite for 
some t, in contradiction to Corollary 4-21 if 7 is transient. 


We single out two kinds of Markov chains for special attention. We 
note that every absorbing state forms a one-element recurrent class, and 
conversely. 


Definition 4-29: A Markov chain is said to be a recurrent chain if its 
states comprise a single equivalence class and if that class is recurrent. 
A chain is called a transient chain if all of its recurrent states are 
absorbing. 


If P is an arbitrary Markov chain with r recurrent classes, then all 
properties of P can be deduced from the properties of one transient and 
r recurrent chains. This assertion follows from the observations: 


(1) Ifthe process P starts in a recurrent state 7, movement from state 
to state is confined to the single equivalence class to which j belongs. 
The properties of the chain started in j are the properties of a chain 
while it is in one recurrent class; they are thus the properties of a 
recurrent chain. 
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(2) If the process P starts in a transient state, its behavior while in 
transient states is the same as the behavior of the transient chain P’ 
obtained from P by making all recurrent states absorbing. If P enters 
a recurrent state, then P’ becomes absorbed. And after P has entered 
a recurrent state, its properties are those of a recurrent chain. Thus the 
properties of P may be studied by considering the one transient chain 
P’ and the r separate recurrent chains. 


Because of these observations, we shall restrict our discussion in 
subsequent chapters to Markov chains which are either transient or 
recurrent. 

The reader should notice that every chain whose states form only one 
equivalence class is either a transient chain or a recurrent chain. 
Shortly we shall examine the basic example, in which all pairs of states 
communicate, to determine when it is transient and when it is recurrent. 

First we discuss some properties of maximal classes for a moment. 
Not every chain has maximal classes; a tree process, for example, 
consists of infinitely many transient classes of one state each. None 
of the classes is maximal. Even if a chain does have a maximal class, 
that class does not have to be closed. The process may have a positive 
probability of disappearing from some state in the maximal class. 

Nor is it true that all closed classes are recurrent. An additional 
condition is needed. 


Proposition 4-30: All closed equivalence classes consisting of finitely 
many states are recurrent. 


Proor: Let the states be the first n positive integers, and suppose the 
class is transient. Then N, is finite for every i and j in the class. 


Therefore 
n n 
Cs u| > n| = > Ni; 
j=1 j=l 
is finite. But c is the mean total number of steps taken in the class, 
and c is infinite because the class is closed. This contradiction 
establishes the proposition. 


To see that infinite closed equivalence classes need not be recurrent, 
we consider the basic example, whose states form a single equivalence 
class. Let H‘ be the probability that the chain, started in state 0, 
returns to 0 at some time up to and including time n. Then 


Hoo = lim HY. 
But _ i 
l- AS? = Pipe---Pnr = Bro 
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since a single step other than away from zero returns the process to 
zero at once. Now in order for the process to be recurrent it is neces- 
sary and sufficient that Ho) = 1 or that 


lim B, = lim (1 — H%) = 0. 
n n 


The reader should be able to construct examples where lim, 8, = 0 and 
where lim, 8, # 0. Thus the basic example may be either transient or 
recurrent. 


8. Problems 


1. Find an expression analogous to that in Proposition 4-3 for Pr{z, = j] 
in a Markov process. 


2. Let p,, be the Poisson distribution with mean m on the non-negative 
integers. A game is played as follows: A random integer n, is selected 
with probabilities determined by p,. A second random integer n, is 
selected with probabilities determined by p,,. The ith random integer 
is selected with probabilities determined by p,,_,. Prove that with 
probability one the integer 0 is eventually selected. 

3. Show that if h > 0 is a column vector for which P"h converges, then the 
limit function is non-negative superregular. 


4. Let j be an absorbing state. Prove that the probability starting at îi of 
ever reaching j is a regular function. 


5. Show that an independent trials process is a Markov chain in which P,; 
is independent of 7. Let 0 be any fixed state and let t be any stopping 
time. Show that Pr,[z,., = j] = Poz, and give an example to show that 
Pr,[z, = j] does not have to equal Po;. 


6. If the symmetric random walk in 3 dimensions is started at the origin, 
the probability of being at the origin after n steps is 0 if n is odd and is of 
the order of magnitude of n~*/? for n even. Prove that the probability 
of returning to the origin is less than 1. 


7. Consider the following random walk in the plane. If the process is not 
on an axis, it is equally likely to move to any of the four neighboring 
states. If it is at the origin, it stays at the origin. Otherwise, on the 
x-axis it takes a step away from the origin, whereas on the y-axis it takes 
a step toward the origin. Give a complete classification of the states. 


8. Let j be a transient state in a closed class. Prove that there must be a 
state 7 in the class such that H, < 1. 


9. Prove that every tree-process is a transient chain and that each equiv- 
alence class of states is a unit-set. 


10. Prove or disprove: In a chain with a minimal class and with no closed 
class, there is no non-zero non-negative regular measure. 


Problems 11 to 14 refer to a reflecting random walk, that is, a random walk 
on the non-negative integers with qọ = 0. 


11. Prove that the only regular functions are constants. 
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12. Let 
Bi = PoPi---Pi-1 Yi = W192 +--+ UN a, = chiy 


Show that, for any choice of the constant c, « is a regular measure. 


13. Show that all regular measures are of the form given in the previous 
problem. 
14. Show that «,;P,/o; = P, for all ¢ and j. 


Problems 15 to 18 refer to a branching process and use the notation of 
Sections 2 and 3 of the text. 


15. Show that the roots of the equation g(r) = r satisfy the following 
conditions: 
(a) There isanr < lifm > 1. 
(b) There isanr > lifm < 1. 
(c) r = 1 is the only root if m = 1. 
16. Show that {r7»} is a martingale if and only if (r) = r. 
17. Show that {z,} is a martingale if and only if m = 1. 


18. What condition on m will assure that the branching process has positive 
probability of survival (of not dying out)? 


Problems 19 to 24 concern space-time processes and martingales. If P isa 

Markov chain with state space S, we define the space-time process to be a 

Markov chain whose states are pairs (2, n), where 7 is in S and n is a non- 

negative integer, and which moves from (i, n) to (j, n + 1) with probability 
ije 

19. Prove that any space-time process is transient. What can be said about 
classification of states ? 

20. Prove that if f(i, n) is a finite-valued non-negative regular function for 
the space-time process, then f(x,, n) is a martingale for the process P 
started at a given state 0. 

21. Specialize to the case of sums of independent random variables on the 
integers with p, = 0 fork < 0. Define p(t) = >, pt" for all t > 0 for 
which the right side is finite. Show that g(t) is defined at least for 
O<tx< 1. Fix at for which g(t) is defined and put 

i t 
1, n) = Ae 
109 = Fo] 
Prove that f(z, n) is regular for the space-time process. 


22. In Problem 21 show that f(x,, n) converges a.e. if the process is started 
at 0. 


23. Specialize further to the case where po = p, = 4, and define, for 
Osi<l 
g(t, n) = t(l — t)i. 
Show by change of variable in Problem 22 that g(z,, n) converges almost 
everywhere in the process started at 0. 
24. Using only the result of Problem 23, prove that if p is any number 


between 0 and 1, not equal to 3, then the probability that 2, = [np] for 
infinitely many n is 0. Here [np] is the nearest integer to np. 


CHAPTER 5 


TRANSIENT CHAINS 


1. Properties of transient chains 


Recall that a transient Markov chain is a Markov chain all of whose 
recurrent states are absorbing. Its transition matrix satisfies 
P4 <1. For any transient state j in the chain, we have seen that 
H, < land N,; < +o for every i. If Æ is any set of states, we can 
put the transition matrix in the canonical form 


E £ 


E/T U 
P= . 
e(z al 


In the special case in which P is a transient chain and Æ is the set of 
absorbing states, we find that T = I and U = 0. (If there are no 
absorbing states, we agree to write P = Q. We shall assume that not 
all states are absorbing, however.) Thus, for a transient chain, 


Ped 


The matrices R and Q for a transient chain will always be associated 
with this standard decomposition. We observe that Q itself is the 
transition matrix for a transient chain and that this chain has only 
transient states. Some authors actually define a transient chain to be 
one with all states transient. However, in the study of these chains, 
it is often convenient to add absorbing states to ensure P1 = 1. And 
as we saw in Chapter 4, the decomposition of general Markov chains 
into transient and recurrent chains depends on allowing absorbing 
states in transient chains. For these reasons we have adopted the 
slightly more general definition of transient chain which permits 
absorbing states. 

Let P be the transition matrix of a transient chain, and consider the 
quantity N;;, the mean number of times in state j when the process is 
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started in state 7. If j is an absorbing state, then this quantity is 
infinite if j can be reached from 1 with positive probability and 0 other- 
wise. If i is absorbing, it is 0 unless i = j, and then it is infinite. 
Hence N, is of interest only when 7 and j are transient. Thus we shall 
agree to restrict N to these entries: The matrix so restricted is called 
the fundamental matrix for the chain. We shall show that the re- 
stricted matrix is the matrix {N} for the chain determined by Q. In 
what follows, N always denotes the restricted matrix associated with P. 


Lemma 5-1: If P is a transient chain and if Ẹ is the set of transient 
states, then (P*); = Q*. 


Proor: We readily verify by induction that 
E E 


E I 0 
Pp = 
E e a) 


and the result follows at once. 
Proposition 5-2: 
N= 5.08 
k=0 
Proor: For transient states i and j, we have in the P-process 


N; = > (P")i; = > ("Jiz 
k k 
by Proposition 4-12 and Lemma 5-1. 

Proposition 5-3: N is finite-valued, and lim, Q* = 0. 


Proor: N, in the P-process is finite when j is transient; hence N is 
finite-valued. Therefore lim, (Q*) = 0 by Proposition 5-2. 


We recall that N, = Mjn,],and N = I + N. Hence N = 5g Q". 


Proposition 5-4: If P is a transient chain, then 


N = QN 

N=1+QN 
Ni; = H Ny 
Ni; = HN; 


Na = 1 + AyNy 
1/(1 — Hy). 


= 
Il 
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Proor: The first two assertions are restatements of Proposition 4-13 
for the case where Q is our transition matrix. The last four results are 
a restatement of Proposition 4-15. 


Note that the conclusions of Proposition 5-4 show how to compute 
N from H and H. Our next result establishes a method of finding N 
without using the H-matrix: For finite matrices the knowledge that N 
is a (two-sided) inverse of J — ęQ is sufficient to determine N uniquely, 
but for infinite matrices it is not. For if r is a Q-regular column vector 
and f is a Q-regular row vector, then N + rB is a second two-sided 
inverse of J — Q. We shall see that such regular vectors r and B often 
exist. 

In Section 2 we shall obtain a refinement of Proposition 5-5 by prov- 
ing that N is the unique minimum non-negative inverse of I — Q on 
each side. 


Proposition 5-5: N(I — Q) = (I —Q)N = I and QN = NQ < N. 
In particular, every row of N is a Q-superregular measure, and every 
column of N is a Q-superregular function. 


Proor: The second and third assertions follow from the first, and 
QN = N — I by Proposition 5-4. Also NQ = N — I by Proposition 
5-2 and monotone convergence. Since N has finite entries, the first 
assertion follows. 


If P is a transient chain with a non-empty set E of absorbing states, 
we define the absorption matrix B to have index sets Ẹ and E and to 
have entries 


B, = Prj[process is absorbed at J]. 


The B-matrix is not square; it has the same index sets as the R-matrix. 


Proposition 5-6: If P is a transient chain with a non-empty set of 
absorbing states, then B = NR. 


Proor: Let i be transient and let j be absorbing. By Theorem 4-10 
with the random time equal to the constant n and with the statement 
p taken as the assertion that the process is absorbed at j on then + Ist 
step, we have 


Pr[p] = > (Pie Ras 


r 
= > (Q" Jie Pi;- 
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Summing on n, we find 


Í 
= 
5 


As a result, we see from the proof of Lemma 5-1 that if P is a transient 


chain, then 
I 0 
lim PF = ( ) 
k B 0 


Let P be an arbitrary Markov chain, let E be a subset of the set of 
states, and let sẹ be the statement that the process is in'states of E 
infinitely often. Define s? by s¥ = Pr,{sz]. 


Proposition 5-7: For any subset E of states in a Markov chain P, 
sē is a P-regular function. 


Proor: Letting p be the statement s, and taking the random time 
to be identically one, we see that p’ in Theorem 4-10 is also sp and that 


Pris;] = > Piy Pry [se] 
k 


or 
sE = P. 


For any Markov chain P we define a hitting vector hë and an escape 
vector e? by 
hë = Pr [process eventually reaches £] 
and 
eE = Pr [process goes on first step from E to Ẹ and then 
never returns to Æ]. 


We notice that if i e H, then AF = 1, and that if j e # then e? = 0. 
The absorption matrix B? for the set H is defined to be a square 
matrix with index set the set of all states and with entries defined by 


BẸ, = Pr [process at some time enters E and first entry is 
at state j]. 
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We see that the B®-matrix is computed by finding the entries of the 
B-matrix for the process with states of H made absorbing. Specifically, 
if E is the set of all absorbing states, then 


E Ē 
E JI ) 
BE = . 
E l 0 
The matrices s7, h¥, e7, and B® are interrelated as in the following 
proposition, whose proof is left to the reader. 


Proposition 5-8: Let P be an arbitrary Markov chain. Then 


1) AE = B41. 

2) hE = e® + Ph? and hence e® = (I — P)h®. 
3) s® = 1 if and only if hë = 1 and P1 = 1. 
) If # C F, then kë < hf and sE < s. 

5) sE = BES. 

6) If E C F, then BEBE = BE. 

7) s£ = lim Pht, 


annann 
AN 


2. Superregular functions 


Superregular measures and functions were defined in Section 4-3; a 
vector is P-superregular if h > Ph. Let P be a transient chain, and 
let Q be the restriction of P to transient states. As we have seen 
before, Q is a transition matrix. Our object in this section is to obtain 
a standard decomposition of non-negative Q-superregular functions 
and to use it in a consideration of the solutions to the equation 
(I — Q)x = f. Our results will hold equally well for Q-superregular 
measures, but we shall not supply the proofs. A way of transforming 
rigorously theorems and proofs about functions into theorems and proofs 
about measures will emerge later when we discuss duality. Generaliza- 
tions of the present results will arise in the study of potential theory. 

The transformation later of theorems about functions into theorems 
about measures by duality will require the existence of a positive finite- 
valued Q-superregular measure. Any row of N will suffice if all pairs of 
transient states communicate, but if not, we proceed as follows: Number 
the states, beginning with 1, and take 


p= 5209, 


where NV is the ith row of N. It is clear that 8 is superregular because 
it is the sum of non-negative superregular measures; B is positive 
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because B; = 5; 27N; > 27N; > 2-7 > 0. Finally B is finite- 
valued because 


= > 2-'Ny = De ‘HoN; <22 ‘Nyy < Nyy < 00. 


The lemma and theorem to follow hold for arbitrary Markov chains. 
In the transient case they will most often be applied to the chain Q. 
The theorem has an analog in classical potential theory, but we 
postpone a discussion of this point until the end of Section 8-1 after 
we have introduced Markov chain potentials. 


Lemma 5-9: Let P be any Markov chain and let N = > P”. If Nf 
is well defined and finite-valued, then (I — P)(Nf) = f. 


Proor: Write f = f* —f~. Then Nf* and Nf- are both finite- 
valued by hypothesis. Since PN + I = N, we have PN < N and 
hence PNf* < Nf* and PNf~ < Nf~. Therefore, by Corollary 1-5, 


(I — PANJ) = Nf — P(Nf) = Nf — (PN)f 
= Nf- (N - If 


Theorem 5-10: Let P be any Markov chain and let N = > P”. 
Any non-negative P-superregular finite-valued function h has a unique 
representation h = Nf + r, where r is regular. In the representation 
f and r are both non-negative, and f = (I — P)h. 

Proor: Since h is P-superregular, 

h > Phz Ph>---> 0. 


Thus P”h converges to a non-negative function r. By the Dominated 
Convergence Theorem, 


Pr = P(lim P*h) = lim P"**h = r. 
Hence r is regular. Also 
h = P*+th + (I+ P+---+ Ph — Ph). 


Since h — Ph > 0, we may apply monotone convergence in passing to 
the limit on n; we obtain 


h=r+ N(h — Ph). 
Set f = h — Ph, and existence follows. 
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For uniqueness, suppose that h = r’ + Nf’ with r’ regular. Then 
Nf and Nf’ are finite-valued since h is. Multiplying the equation 


r+ Nf=r’ + Nf’ 
through by J — P and applying Lemma 5-9, we obtain 
J=f. 


Hence also r = 7’. 


We return now to the special case of transient chains, where N = 
>Q". A solution g to an equation is the minimum non-negative 
solution if whenever h is a non-negative solution, we have h > g = 0. 


Proposition 5-11: Iff > 0 and if Nf is finite, then Nf is the minimum 
non-negative solution of (I — Q)x = f. 


Proor: By Lemma 5-9, Nf is a solution. Let x be any non-negative 
solution. Then v is finite-valued and superregular. By Theorem 
5-10, x = Nf + r wherer > 0. Hence x > Nf. 


It follows that N is the minimum non-negative right inverse of 
(I — Q). To prove that the jth column of N is minimum, define f by 
fi = ôy; and then apply Proposition 5-11. After the analog of Prop- 
osition 5-11 for measures has been established, we find similarly that 
N is the minimum non-negative left inverse of (J — Q). 


3. Absorbing chains 


A class of Markov chains of special interest is the class of absorbing 
chains. We shall use the material developed in the two preceding 
sections to establish the basic facts about absorbing chains. 


Definition 5-12: A Markov chain P is said to be absorbing if, for every 
starting state, the probability of ending in an absorbing state is one. 


If P is a Markov chain containing a recurrent nonabsorbing state 1, 
then the process cannot be absorbed if it is started in state i. That is, 
all absorbing chains are transient chains. It is not true, however, that 
all transient chains are absorbing. The property P1 = 1 is a neces- 
sary condition. But even it is not sufficient, since the basic example 
can be transient but is never absorbing. 

The proposition to follow is the special case of the identity B71 = hA” 
in which Æ is the set of all absorbing states. 
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Proposition 5-13: If P is a transient chain, then P is absorbing if and 
only if B1 = 1. 


The next two propositions give two ways in which absorbing chains 
arise. 


Proposition 5-14: If P is a finite transient chain such that P1 = 1, 
then P is absorbing. 


PROOF: 
B1 = (NR) = N(R1) 
= M(I — Q)1]_ since (R1 + Q1), = (P1) = 1 
= [N(I — Q)]1__ by Corollary 1-6 
= 1. 


Let a(w) be the time on the path w of a chain P that absorption takes 
place. If the process is not absorbed along w, define a(w) = +00. 
Since a(w) = >, n,(w), where the sum is taken over the transient states, 
we see that a is measurable and we conclude that a is a random time. 
Define the column vector a by a; = Ma]. The vector a is indexed 
by the transient states. It is clear that the chain P is absorbing if and 
only if a is finite a.e. 


Proposition 5-15: If P is a recurrent chain and if “P is the Markov 
chain obtained by making a non-empty set E of states absorbing, then 
EP is absorbing. 


Proor: Let je E. Since H, = 1 for every i, t,(w) is finite almost 
everywhere. But a(w) < t,(w), and FP is thus absorbing. 


The notation =P will be used in later sections to refer either to the 
chain P with the states made absorbing or to the chain P made so that 
it disappears instead of entering E. If 


T U 
P= , 
(z 0] 


then these two chains are, respectively, 


(a) (so) 


It will be clear from the context which one is meant. 
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There is one more important way in which absorbing chains arise. 
Suppose that P1 # 1. If we add an absorbing state 0, the state 
“stopped,” to the state space S and define P by 


Po; = ŝo; 
Pyo=1- > Py ifi#o0, 


keS 


then P, called the enlarged chain, may be absorbing. If P is a finite 

transient chain, then P necessarily will be absorbing by Proposition 5-14. 
With this set of propositions to indicate how absorbing chains arise, 

we conclude with an investigation of the properties of the vector a. 


Proposition 5-16: If P is a transient chain, then a = N1. 
PROOF: 
(N1), = > N, summed over the transient states 
j 
= >, Min] 
j 
= m| > n| by monotone convergence. 
i 


But a = >,n,, where the sum is taken over transient states j. Thus 
(N1), = Mia] = a. 


Corollary 5-17: If P is a transient chain for which P1 = 1 and if a 
has only finite entries, then x = a is the unique minimum non-negative 
solution of the equation (I — Q)x = 1. 


ProorF: It is the unique minimum non-negative solution by Proposi- 
tion 5-16 and Proposition 5-11. 


4. Finite drunkard’s walk 


The finite drunkard’s walk is a Markov chain defined on the integers 


{0,1,...,n} with states 0 and n absorbing and with transition 
probabilities 

Piisi =P 
and 


Put =q=l—p for0<i<n. 


If we setr = q/p, two cases arise. Eitherr = 1 and {x,}is a martingale 
orr # land {r*=} is a martingale. 
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We shall use the second martingale systems theorem (Theorem 3-15) 
to compute the entries of B, H, N, and a for the case r = 1. 

To compute the entries of the B matrix when r = 1, we note that 
B1 = 1 by Proposition 5-14; therefore Bj, = 1 — Bin for each transient 
state i. Since {x,} is a bounded martingale, Corollary 3-16 applies 
with the time taken as the time of absorption (which is a stopping time 
because P is absorbing if and only if a is finite a.e.). Then 


i = OBy + nBin 
so that 
Bi, = ifn 
and 


Bio =l- ijn. 


To find the entry H,, of the H-matrix, we make state j absorbing and 
consider the resulting process. If i < j, the modified process is the 
drunkard’s walk on the integers {0,...,7} with j absorbing. Hence 
H, = i/j. Ifi > j, the modified process is the drunkard’s walk on 
{j,... n}. Renumbering the states, we can consider the process as 
starting at i — j and taking place on the states {0,..., n — j}. Thus, 

Fi iin Sn ao 
` n-j n-j 
To get H,,, where i is transient, we use the fact that H = PH, so that 


Aix = pHi + QA 14 


n ; 
SA Aaaa since p = q = i. 
The N-matrix is determined as a function of H and H by 
1 
Nj; = — 
n H, 
and 
Ny = HN; 
We find 
2 
z On —j) fori<j 
Ny = 


Z (iin = i) for i > j. 


Finally, we have 


n-1 
a, = Mia] = (N1), = > N; = Un — îi). 
j=1 
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The case r # 1 proceeds in the same way. By the systems theorem 
ri S r° Bio + t” Bin. 
From this equation one easily deduces that 


pt — r” 
l-7” 


After first computing H and N, we find 


Bio = 


1 i l-r 
a; = Mia] >l n | 

When r > 1, this process is sometimes known as “gambler’s ruin” 
because of the following interpretation. A gambler walks into a 
gambling house with i dollars in his pocket, and the house has n — i 
dollars to bet against him. In a given game the gambler has prob- 
ability p of winning. Since the house fixes the odds, we have p < 4 
and thereforer > 1. If the game is played repeatedly, x, in the above 
Markov chain represents the gambler’s cash after k games, and B,, is 
the probability of his eventual ruin. Since r > 1, 

—(n-i) 
Bio = a >1l—r-@-) 

is nearly 1 when n — 1 (the house’s capital) is large. Thus the gambler 
is nearly sure to be ruined, no matter how rich he is. However, a; is 
approximately t/(q — p), which is very large if i is substantial and p 
is near to 3. Thus the gambler is likely to have a long run for his 
money. 


5. Infinite drunkard’s walk 


Extending the finite drunkard’s walk to a process P defined on all 
of the non-negative integers, we set 


Poi = Soi 
Pii = P for 0< i< œ 
Put == 1l—p ford0<i<o. 


Again we take r = q/p. 
Our first problem is to establish the sense in which the infinite 
drunkard’s walk P is the limiting case of the finite drunkard’s walk. 
Let "B, "N, and "a denote, respectively, the absorption matrix, the 
fundamental matrix, and the mean time to absorption vector for the 
finite drunkard’s walk on the integers {0,..., n}. Define in connection 
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with the infinite drunkard’s walk the random variable "n, to be the 
number of times the process is in state j up to the first time it is in state 
k. Let p be the statement that the process is not absorbed at state 0, 
and let p, be the statement that the process is not absorbed at state 0 
at any time up to the time it reaches state k. 


Proposition 5-18: In the infinite drunkard’s walk 


Bio = lim "Bio, 
n 


Ni; = lim "Ni, 
n 


and 
a, = lim "q,. 
n 


Proor: We have Pr[p] = 1 — Bio and Pr[p,] = 1 — "By. Since 
the union of the truth sets of the p, is the truth set of p, we have, by 
Proposition 1-16, 

1 — By = lim (1 — ”B,o). 
n 


For the N-matrix we note that 


"Ni; = Mi["n;] 
and 
n; = lim "n; monotonically. 
n 


The result for N therefore follows from the Monotone Convergence 
Theorem. Since a; = Mj{a] = 5, N, the assertion about a, is also a 
consequence of monotone convergence. 


Taking the limits of some of the quantities computed for the finite 
drunkard’s walk we find that 


1 ifr>1 
Bo = : ; 
rt ifr<l 
and 
4 f 
ifr> 1l 
a= q- p 


+00 ifr <1. 


The value of B,. shows that the chain is absorbing when p < q; 
that is, a is finite almost everywhere when p < q. However, Mia] is 
finite only when p < q. 
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If we have calculated Biọ and have seen that a is a stopping time 
when p < q, we may compute Ma] directly from martingales without 
any knowledge of the N-matrix. Let p < q, and for 0 < s < 1 define 


gk 


SEn = Ose er 

where n represents time and k represents position. Now 
f(k, n) = s*(ps + gs71)" < (ps + gs *)-*, 

which is maximized when s = Vq/p. Thus 


Jik, n) < (plig? 
< l since p <q. 


IA 


A 


Hence f is bounded. It is easily seen that {f(x,, n)} is a martingale: 
Since f is bounded, M[f] is finite; the reader may verify the regularity 
property by showing that 


PL (En +1,n+1)+ QS (Xn — l,n + 1) = f (£n n). 
Let a be the stopping time of Corollary 3-16. Taking i as the starting 
state, we have 
s? 
CSEE P Mm 
(ps + =? = 2 Pria = "lps + ey 
Set l/u = ps + qs}. Then 


1 — VI = 4pqu? 
2pu 
and 


> Pr[a 


p(u) = > Pr[a = nju", 


1 — V1 — 4pqu?\i 
nur = (LYE — fone 
Defining 


we note that 
gy’ (u) = > n Pr[a = nju™-} 
and that 


= > n Pra 


Using the fact that V1 — 4pq = q — p to calculate ¢’(1), we find that 
a; = i/(q¢ — p), in agreement with the result obtained by the longer 
method. 


n] = Ma] = a. 
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The present method further allows us to find the probability distri- 
bution of Pr[a = n] by expanding [(1 — VI — 4pqu?)/(2pu)]' as a 
power series in u. We thus find that 


Pria = n] = 0 forn <i 


Pr[a = i] = qd 
Pra =i + 1] =0 
Pr[a = i + 2] = tpg'*? 


and so on. 


6. A zero-one law for sums of independent random variables 


Historically, the first infinite Markov chain that was studied was the 
sums of independent random variables process. We gather some of the 
results in the next few sections, beginning with two propositions and a 
corollary of rather general applicability. 


Proposition 5-19: If P is a Markov chain for which the only bounded 
P-regular functions are constant vectors, then, for each subset of states 
E, Pri{s,;] = 0 or 1, independently of the starting state i. 


Proor: By Proposition 5-7, s? is regular and it is clearly bounded; 
therefore s€ = c1. On the other hand, by Proposition 5-8, 
s& = BEst 
so that 
c1 = cB = chë. 


Therefore, either c = 0 or hE = 1. In the latter case, së = 1 by 
Proposition 5-8. 


As in Example 6 of Section 4-6, we let pp = Pii4,, which, by 
assumption, is independent of i. 


Proposition 5-20: Let P be the transition matrix of a Markov chain 
obtained from sums of independent random variables. If, for each pair 
of states q and r, there is a state s such that q can be reached from s or s 
can be reached from g and such that r can be reached from s or s can be 
reached from r, then the only bounded regular functions are constant 
functions. In particular, the hypothesis is satisfied if all pairs of states 
communicate. 


Proor: Let f be non-constant regular and suppose fy # f, We 
shall assume that s can be reached from both q and r; the proof in the 
other cases is completely analogous. Let g,q + a1, q + a; + dg,..., 
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q +a +: + am and r,r + b,,...,7 +6, +--+ b, be respective 
paths of positive probability leading from q to s and from r to s. Then 
Jara +e tam = fs = feto, +e + Dy? 

so that at least one of the equalities in the two chains 
fa zz Jara = fara Fag TOOS Sasa; +o +m 


and 
f = Seto z feror +b: T= feror +e + Bn 


must be false, since otherwise f, = f,. Without loss of generality, let 
Jarras- Z farta, and let a = ap. Then p, > 0. Let gj = 
fiza — fi Then g is not identically 0. Further, g is regular because 


2 Pug; = 2 Pijlfi+a — fi) 
= 2 Pufisa = 2 Pubs 
= 2 Preasrafise = 2, Puli 
= fisa — fi 


= Ji 
Suppose that for all 2, |f;| < c. Then |g,| is bounded by 2c. Since 


multiplying g by —1 affects neither its regularity nor its boundedness, 
we may assume b = sup, g; is positive and finite. For any i and any 


m > 0, 
m-1 
> Ji+ka| = eee -= fil < 2e. 
k=0 


Choose N so that N-b/2 > 2c. Let p™ = P{,na = (pa)” > 0, let 
p = min,<y p™, and let t be a state such that g, > b(1 — p/2). A 
choice for t exists since b is finite and since p > 0. Then forn < N, 


(n) 
o( - ®) < o(1 - 3) 2) 


= > PRI: 
k 
= PV ne Fes aie + 5 PRO: 
k#t+na 

< PIitna + (1 = p™)b. 

Thus 
Giana > 0/2 for n < N. 

Hence 


N-1 


> Gt+ka > 2c, 
k=0 


a contradiction. 
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Corollary 5-21: If P is a sums of independent random variables 
Markov chain in which all pairs of states communicate, then for each 
subset of states E, Pr,[s;] = 0 or 1, independently of the starting state i. 


7. Sums of independent random variables on the line 


Let P be a sums of independent random variables Markov chain 
indexed by the integers and defined by the probability distribution 
{Pk} The set of integers k for which p, > 0 we shall call the set of 
k-values associated with the chain. We shall assume that the greatest 
common divisor of the k-values is one. Thus, if both positive and 
negative k-values exist, we see (from Lemma 1-66, for example) that all 
pairs of states communicate. 

The mean m for the process is defined by m = 5), kp, and is said to 
exist if and only if the positive and negative parts of the sum are not 
both infinite. In this section we shall establish the following result. 


Proposition 5-22: If P is a Markov chain representing sums of 
independent random variables on the line, if there are finitely many 
k-values and if they have greatest common divisor one, if > p, = 1, 
and if m = > kp,, then in order for the chain to be recurrent it is 
necessary and sufficient that m = 0. 


Before we come to the proof, two comments are in order. The first 
is that the proposition can be generalized to the case where there are 
infinitely many k-values as long as the mean m still exists and the k- 
values still have greatest common divisor one. The same condition 
m = 0 is necessary and sufficient for P to be recurrent. The second 
comment is that the necessity of the condition m = 0 is an immediate 
consequence of the Strong Law of Large Numbers (Theorem 3-19) and 
that the special added assumption we used in the proof of that theorem 
translates exactly into the condition that there are only finitely many 
k-values. Nevertheless, we give a different proof. 

For the proof we may assume that both positive and negative 
k-values exist. Otherwise, the chain is obviously transient. Recalling 
our discussion in Example 2 of Section 3-2, we observe that if both 
positive and negative k-values exist, then there are either two distinct 
real roots or one double real root of the equation 


f(s) = 2 Pe = 1. 


If there is a root r other than s = 1, then {r*»} is a non-negative martin- 
gale. And ifs = 1 is a double root, then {w,} is a martingale. 
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But f’(1) = (È kp,s"),-1 = > kp, = m, and f(s) = 1 has a double 
root at s = lif and only ifm = 0. In the case m # 0, {r*} converges 
a.e. and must converge to the zero function. Thus, according as 
r< lorr > 1, we have lim z, = +œ or lim z, = —o. Hence, the 
process returns to each state only finitely often with probability one. 
But if the chain were recurrent, each state would be reached infinitely 
often with probability one. Therefore, ifm # 0, the chain is transient. 

In the case in which m = 0, let — u be the smallest k-value and let v 
be the largest k-value. Let E be the set of states {—u,..., —2, — 1} 
and let F’ be the set of states {j,j + 1,...,j3 + v — 1} for some fixed 
j. Start the process in state i with 0 < i < j, and let t be the time to 
reach the set Æ U E’. The chain stopped at time t is absorbing by 
Proposition 5-14, and t is therefore a stopping time. Since {z,} is a 
bounded martingale before time t, Corollary 3-16 applies. Therefore, 
M[xo] = M[z,], and for 0 < i < j, we have 


i = M[x)] = Miz] = > Bik + > BE 


keE keE’ 
> —u> BE +j) BE 
keE keE’ 
= —uhf + jhe, 


which, by Proposition 5-13, 


—uhF + j(1 — hË). 
Then 

(j + whf zj-i 
and 


Letting j — œ, we find that AF = 1 for alli > 0. 
Reversing the argument for i < 0 and F = {1,..., v}, we find 
similarly that AF = 1 for alli < 0. Thus, for any state i, AFV? = 1. 


By Proposition 5-8, sFYF = 1. Since Æ U F is a finite set, Proposition 
4-28 applies, and the chain is recurrent. 


8. Examples of sums of independent random variables 


Calculations with sums of independent random variables on the line 
normally involve either martingales or difference equations. We shall 
illustrate in this section each of these methods with an example. 


EXAMPLE 1: Let the defining distribution for a Markov chain P 
representing sums of independent random variables on the integers be 


{pı = q, P2 = p}. The process is obviously transient since H,,; = 0 
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and N, = 1 for all j. Since Ny = H; N; = Hy; = Ho j-p we will 
have determined the H-matrix and the N-matrix completely by finding 
the value of Ho, for all k. We note first that Hy, = O if k is negative. 
For the case k > 0, {r?»} is a nonconstant martingale if r is a nonzero 
root other than one of the equation > p,s* = qs + ps? = 1. Thus 
{(—1/p)*} is a martingale. Taking the stopping time t as the time 
when the process reaches or passes state k, we find from Corollary 3-16 
that for the process started at state 0 


(—1/p)° = Hod —1/p)* + (1 — Hox)(—1/p)**?. 


Therefore, 
1 
H. = —— (1 — (—p)¥+?). 
It is interesting to note that 
1 
lin Hy, = ——: 
en ok 1 + p 


This result can also be obtained from the Renewal Theorem of Section 
1-6 if we observe that 


m= > kp =q + 2p=1+p 
so that 


EXAMPLE 2: Let p_, = §, p, = 3, and pz = $. Then 
1 


m = > kp, = 1, 
and the process is transient by Proposition 5-22. Let the transition 
matrix be called P. 
If g is a P-regular vector at state i, then g; = (Pg), and 
Mi = 39-1 + Jiri + Wise 


We shall need a characterization of such vectors in the calculation of the 
H-matrix. The difference equation we have just formed may be 
rewritten as 

49:42 + 39:41 — 99; + 29;-1 = 0. 


Its characteristic equation (see Section 1-6b) is 
4k? + 3k? — 9k + 2 = 0, 
whose solutions are k = 1, }, and —2. Thus, 


gı = A + BGF + C(-2y. 
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In finding the entries of the H-matrix, we know that H, = H,_; 0; 
therefore, it is sufficient to consider only the entries Hj). We shall 
look at the cases 1 > 0 andi < 0 separately. 

Suppose i > 0. Since the only negative k-value is — 1, it is impos- 
sible for the process to start in state 1 + 1 and reach state 0 without 
first passing through state i. Thus, for i = 0, 


H; +1,0 = H, +14Hi0 
T HiHi, o 


with Hoo = 1. The result is a first-order difference equation whose 
solution is H; o = c(H,,)'. Setting i = 0 shows that c = 1. Thus 
H, o is exponential. 

But H = PH, so that H,o = (PH); for all i > 0. Thus H,o 
satisfies the difference equation 


Ji = 30-1 + 3941 + Hise 


fori > 0. It therefore satisfies 


Ji+ı = $9: + 3Gis2 + S9isa 


for i > 0. Hence H,;, = A + B(d)' + C(—2)' for all i > 0. Since 
H, o is known to be exponential, two of the coefficients A, B, and C are 
zero and the other is one. The alternatives C = 1 and A = 1 are 
eliminated, respectively, by the facts that — 2 is not a probability and 
that P drifts to the right a.e. Thus, 


Hio = (4) for i 


IV 


0. 


ll 


For i < 0 we again use the fact that H 
is a solution of the equation 


PH, and we find that H; o 


Ji-2 = 891-3 + 39i-1 + $9: 
for alli < 1. Therefore, 
Hy = A + Bh) + C(—-2)! 


for all i < 1. Known values for H,;. when i = 0 andi = 1 give us 


two conditions on the three unknowns 4, B, and C. The fact that 
yo < lasi— —o tells us that B = 0. We have as a result 


(4)! for i 
Hio F 3 . ; 
á+ 43(-—2) fori 


0 


IV 


l. 


IA 


5-23 Ladder process for sums of independent random variables 125 


From a knowledge of the H-matrix, we can compute H by H = PH. 
The entries of the N-matrix follow from 


N; = 1/(1 gi Hi;;) 
and 
Ni; a HN; 


9. Ladder process for sums of independent random variables 


For a sums of independent random variables Markov chain defined 
on the integers, we define a sequence s,(w) of positive step times induc- 
tively as follows: so(w) is the least n such that x)(w) > 0, and s,(w) is 
the least n such that x,(w) > %5,_,(@(w). If we construct a stochastic 
process by watching the old Markov chain only at the positive step 
times—that is, by calling the nth outcome in the new process the s,th 
outcome in the old process—then the strong Markov property as 
formulated in Theorem 4-9 implies that the new process is a Markov 
chain. We shall go through this implication in detail. 


Proposition 5-23: If P is a sums of independent random variables 
Markov chain defined on the integers, then the stochastic process whose 
nth outcome is the s,th outcome in P is a Markov chain P+. Moreover, 
På = P Ot 


Proor: The times s, are random times. Applying Theorem 4-9 to 
the time s, and the statement r = (£s, ,, = Cn+1), we find that, if 
Pr,[%,, = Co A+++ A Xs, = Cp] > 0, then 


Prylts,4, = nti | Zso = Co A+ A Oy, = Cn] 


= Prais, = n41 | Ts, = Cn]. 
And if Pr,[z,, = c,] > 0, then 
Pr,[7,, , 1 = Cn+1 | ts, = Ca] = Pr, [ts = Cn+1]- 


Thus the process is a Markov chain P+. The fact that P = Pg,_; 
follows from the fact that P represents sums of independent random 
variables. 


The chain P* is called the ladder process for P. The ladder process 
moves from 7 to 7 if j is the first state greater than i that is reached in 
the original process. If the mean step m in P is positive, then the 
process reaches or passes any given positive state with probability one, 
so that the s; are finite a.e. Hence P+1 = 1, and the ladder process 
represents sums of independent random variables. 
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As an example, we shall compute explicitly the ladder process 
associated with Example 2 of the preceding section. For the given 
chain we have p_, = 3, p, = 3, and p3 = $. The ladder process has 
two k-values, namely 1 and 2, and thus has a distribution {p} , pz}. 
Since the positive step times are finite a.e., we have > py = 1 or 

pi + pz = 1. 
To find the values of pt and pł, we note that 
Pz = ‘Hoo 
Poo + Po,-1pi P? 
= Po + P-i1PiP2- 


Putting in the known values for p_, and po, we find 


pi = pz =b. 


The ladder process for our Example 2 is therefore an instance of 
Example 1 in the same section. 


10. The basic example 


The basic example is a Markov chain with state space the non- 
negative integers and with transition probabilities 


Pi- = Dp t > 0 
Pi-i0 =U =1—- pe 


We normally assume that none of the p,’s is 0. A row vector f is 
defined by 
Bo = 1 


Bi = pP,-1 for i > 0; 


it is regular if and only if lim; B; = 0, and the process is recurrent or 
transient according as the limit is or is not 0. 

In this section we shall compute the H and N matrices for the basic 
example when it is transient, and we shall show that a transient basic 
example has no non-zero regular row vector. 

The process cannot leave the set {0,1,...,7} without hitting j. 
Hence H, = lifi <j. Ifi >j, then j can be reached only via 0, so 
that 

H; = HoHo; = Hy 


by Proposition 4-16. Thus we need only find Hj). The only way the 
process can fail to reach 0 is to continue moving to the right from 2. 
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Let fo = lim; R= Tei Pe Then 
1- Ho = TT p: = 
j>i i 
and we find 
1 ifi<j 
H; = Bo 


1 -= ifit>j7. 
B, 1 


Ay; = PisiHjri + +10; 


Then 


Suppose now that the process is transient—that is, that Bo > 0. 
Then 


Ni; = =z p?’ 
so that 


Ni or HN; = 


Bi Bi geL, 
-2 ifi>j. 
l J 


Ba. 
If fo > 0, we know that £ is not regular. Indeed, a transient basic 
example has no non-zero regular row vector. For if a is regular, then 
> Qdi+ı = &o 
i 


and 
a-p; = œ~; for j > 0. 


From the second condition we find by induction that «; = o8;. Then 
the first condition yields 


a = to > Bisi z «o >. (B: — B41) = @(1 — Po). 
1 i 
Thus «o = 0 and « = 0. 


11. Problems 


1. Consider the finite Markov chain with states 


0 1 2 3 4 


128 


10. 


ll. 
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States 0 and 4 are absorbing. At each of the other states the process 
takes a step to the right with probability 4, or a step to the left with 
probability 4. Compute P, N, and B by means of Propositions 5-5 and 
5-6. 


. If the states of a transient chain form a single closed set, show that each 


column of N is a non-constant positive superregular function. [Note: 
We shall see later that there are no such functions for recurrent chains; 
hence their existence is a necessary and sufficient condition for a closed 
set to be transient. ] 


. Prove Proposition 5-8. Prove also that hE = Ne® + s. Interpret 


each result. 


. In the basic example, let Æ = {0, 1,2}. Compute B®, hē, eF, and sF. 


Check formulas (1), (2), and (3) in Proposition 5-8. 


. Prove an analog of Theorem 5-10 for row vectors. Use it to show that 


if 7 > 7P > 0 in a transient basic example, then there is a measure p 
such that 7 = uN. 


. For a transient chain let 


w= Mi{nz], 


where n; is the number of times the chain is in the finite set of states E. 
Use a systems theorem to find an equation of the form 


and prove that x is the minimum non-negative solution. 


. Find the probability in the p-¢ random walk started at 0 of reaching +n 


before —n. [Hint: Use the results obtained for the finite drunkard’s 
walk.] If p > q, what happens to this probability as n increases ? 


. The one-dimensional symmetric random walk is a process to which 


Corollary 5-21 applies. If Æ is the set of primes, is sē equal to 1 or is it 
equal to 0? 


. Let 2, %1, %2,... be the outcome functions for the symmetric random 


walk on the integers started at 0. Show that there is no non-constant 
non-negative function f(n) defined on the integers such that f(z»), 
f(x1),... is a martingale. 


Show by direct computation that the sums of independent random 
variables process on the integers with p = 4 and p_, = 2 is recurrent. 


Find H and N for sums of independent random variables on the integers 
with p_; = pz = 3. 


Problems 12 to 19 refer to sums of independent random variables on the 
integers with p_, = 4 and p; = 2. 


12. 
13. 
14. 


15. 


Find H and N. 
Describe the long-range behavior of the chain. 


Give two examples of infinite sets E to illustrate the two possibilities 
s€ = 1 and s = 0. 


Find all non-negative regular functions. 


16. 


17. 


18. 


19. 
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Give a necessary and sufficient condition on a non-negative function f 
that Nf be finite-valued. 


In the previous problem, let g = Nf. Choose an f satisfying your 
condition, and show that (I — P)g = f. 


Let 


i= 


2+ (4)! ift<0 
(3 + 1)(4)' if ti > 0. 
Show that h is superregular, and decompose h as in Theorem 5-10. 


Use the function of Problem 18 together with the Martingale Convergence 
Theorem to prove that the process is to the left of 0 only finitely often a.e. 


Problems 20 to 22 refer to the game of tennis. It will be necessary to know 
how one keeps score in tennis. A match is being played between A and B, 
and A has probability p of winning any one point. 


20. 


21. 


22. 


Set up a single game as a transient chain with the two absorbing states 
“A wins” and “ B wins.” [Minimize the number of states, e.g., identify 
“30-30” with “‘deuce.”’] Compute the probability that A wins the 
game as a function of p. 

Suppose that A has probability p’ of winning a game. What is the 
probability that he wins a set? What of winning the match (if he is 
required to win three sets) ? 

What is the probability that A wins the match if p = 0.6? What if 
p = 0.51? 


CHAPTER 6 


RECURRENT CHAINS 


1. Mean ergodic theorem for Markov chains 


Recurrent chains are Markov chains such that the set of states is a 
single recurrent class. They have the properties that P1 = 1, H = E, 
and M,[n,] = œ. The study of recurrent chains begins with a charac- 
terization of finite-valued non-negative superregular measures and 
functions; the reader should turn back to Sections 1-6c and 1-6d for the 
terms referred to in what follows. 

We shall apply Proposition 1-63 and Corollary 1-64 to the sequence 
of matrices obtained as the Cesaro sums of the powers of a recurrent 
chain P. Define 


L™ = > PF. 


1" 
n Zo 
Then 

0< L™ <E foralln. 


Theorem 6-1: If P is the matrix of a recurrent chain, then the sequence 
of powers of P is Cesaro summable to a limiting matrix L with the 
properties L > 0 and LP = L = PL = L?. 


Proor: We shall show that every convergent subsequence converges 
to the same limit L. The proof proceeds in four steps. 
(1) Since 


1 =i +P + Pedy, 
we have 
PL® = 7 (P+ PP 4... 4 P”) = L™p 


E Pi: 1 (pn — I). 
n 
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Let {L} be a convergent subsequence; such a sequence exists by 
Proposition 1-63. Set L = lim, L“. Then L > 0. Since 


lim (p* — 1) = 0, 
n Nn 


we have lim, PL) = L = lim, L™ P. 
(2) By Proposition 1-56 (dominated convergence), we have 


lim PL® = P lim L™ = PL 


| 


and thus 
PL = L. 


By Proposition 1-55 (Fatou’s Theorem), we may further conclude 
(lim L%)P < lim (LP) = L 
and 
LP < L. 


(3) Suppose LP is not equal to L. Then for some i and j, (LP); < 
L. Summing the inequalities (LP), < Lip an k, we obtain 


Žž (LP)in < 2 Lik 


since strict inequality holds in the jth entry. Thus [(LP)1]; < (L1). 
Since L, P, and 1 are non-negative, associativity holds and (LP)1 = 
L(P1) = L1. Therefore, [(ZP)1], = (L1), and we have a contradic- 
tion. Hence LP = L = PL. By induction, we readily see that 
LP" = L = P"L for every n. Adding these results, we obtain finally 


LL® = L = LL, 


(4) Let {2} be a convergent subsequence with limit L. It is 
sufficient to show that L = L. From step (3) we have L = LL.) 
for any p, and by Fatou’s Theorem L1 < 1 and £1 <1. Thus, by 
dominated convergence, 


L = lim (LL) = LE. 
u 
Interchanging the roles of L and L, we find L = LL. But by Fatou’s 
Theorem LL < Land LL < L. Therefore, 
L=fIL<L and L= LL < TL. 
Hence L = Land L = I2. 
Definition 6-2: If P is a recurrent chain, P is said to be a null chain 


if L=0. If L #0, P is said to be an ergodic chain and the limit 
matrix L is called A. 
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Proposition 6-3: If P is a recurrent chain, every constant function is 
regular and the only non-negative (finite-valued) superregular functions 
are constants. 


ProorF: Constant functions are trivially regular since P1 = 1. Let 
h be a finite-valued non-negative superregular function, and let the 
chain be started in any fixed state i. Since M[|A(x9)|] = h; < œ, 
(h(z,), Zp) is a non-negative supermartingale (see Section 4-3). Thus 
lim, A(x,) exists and is finite with probability one by Corollary 3-13. 
If h is not a constant function, then h, # h, for some jand k. Since the 
chain is in states j and k infinitely often a.e., h(x,) = h; and h(x,) = hy 
for infinitely many n with probability one. Thus h(2,) diverges a.e., a 
contradiction. 


To prove the corresponding result for measures, we introduce the 
dual matrix P, defined whenever a positive finite-valued P-super- 
regular measure a exists. The entries of Ê are Ê, = «Pala Al- 
though we shall investigate P more fully in the next section, we mention 
some of its properties here. Suppose P is recurrent. Since Ê, > 0 
and since 


ij 


P l l 
È Pys z 2 uPx soa = 1, 


i 


Ê is a transition matrix. Since all pairs of states communicate in P, 
they do in P. Now, using induction on n, we note that if 


n-1)_ 
(Pij OEY Nie ee ati tana j, 
then i 

- ar (BY) 
(Py = > Pf PP eg = =! >, (PP ie Pas = ae a, 


k k Oy 


= 


Summing on n, we see that M,[fi,] 
P is recurrent. 


Il 


+c because M,[n;] = œ. Hence 


Proposition 6-4: If P is a recurrent chain, all (finite-valued) non- 
negative superregular measures are regular and are uniquely deter- 
mined up to multiplication by a constant. A non-zero non-negative 
superregular measure is positive. 

Proor: We prove the second assertion first. Suppose « > «P. 
Then « > aP" for every n. If a, = 0 and a; > 0, find n such that 
(P"),;, > 0. We have 


a, 2 X Pg > a,(P"), > 0, 
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a contradiction. Hence a > 0. For the first assertion of the prop- 
osition, let @ and f be non-zero non-negative finite-valued super- 
regular measures. Then a and £ are positive. Use « to form the 
recurrent chain Ê. Then P1 = 1 since Ê is recurrent. Therefore, 


= P, 1 
1 = > Py = DEA = =D Py. 
Jj 3 l i j 


Thus œ; = ’; «P; and « is regular. If we can show that {8,/a;} is a 
superregular function for P, we will have shown that 8 = ca, and the 
proof will be complete. We have 


Proposition 6-5: If P is ergodic, then A = 1a, «1 = 1, and « is 
regular. 


Proor: We have PA = A. Thus every column of A is regular and 
must be constant by Proposition 6-3. Hence A = 1a. Since AP = A, 
every row of A is regular anda must be regular. It therefore remains to 
be shown that «1 = 1. Now A? = A so that (1a)(1«) = (1«). By 
associativity 1(a(1«)) = 1a so that a(1a@) = a. But a(1a) = («1)æ so 
that (a1)a = œ. If a, # 0, then from («1)a; = a; we may conclude 
al = 1. 


The existence of a positive regular measure for ergodic chains is thus 
an easy matter to prove. For null chains, however, the proof is harder 
since the limiting matrix L = 0is no help. The technique we shall use 
is to watch the recurrent chain P only while it is in a subset Æ of the set 
of states. 

Let E be a subset of states and let PE be the stochastic process whose 
nth outcome is the outcome of P the nth time the process P is in the 
set E. We shall see in Lemma 6-6 that PF isa Markov chain. From its 
interpretation it is clear that P* is recurrent if P is recurrent. More- 
over, if E C F, then (PF) = PE. 

The index set for the matrix P7 is taken to be E. Writing P as 


EE 
E/T U 
PE ; 
ele o 


we have the following relationship between P” and P. 
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Lemma 6-6: For an arbitrary Markov chain P, PF is a Markov chain 
and 
PE = T + UNR, 
where N = D779 Q. 


Remark: The lemma holds even if N has infinite entries, provided we 
agree as usual that 0-00 = 0. 


Proor: Let y, be the nth outcome in the P¥-process. If 
PrilYo = Co A+++ A Yn-1 = Cn-1] > O, 
let t be the random time of outcome n — 1 and apply Theorem 4-9. 
Then 
Prilyn = Cy | Yo = Co N***A Yn-1 = Cn-1] 
PralYn = Cy | Yo = Co AtA Yn-1 = Ca-1 AU = Cn-1] 
T Pre, [Y = Cn), 


and it follows that P? is a Markov chain. Now let i and j be in E. 
Applying Theorem 4-10 with the random time identically one and with 
the statement that Æ is hit after time 0 first at state j, we have 


Pi; a > Pi, Bu; 
k 
> PB; + > Pi, Bi; 
keE kee 


= Py + > Py BE. 
k€E 


The result then follows from Proposition 5-6. 


Lemma 6-7: For an arbitrary Markov chain P, if E is a subset of 
states and f is a finite-valued non-negative P-superregular measure, 
then 8, is P¥-superregular. 


Proor: Since 8 > BP, multiplication of the submatrices of 8 by the 
submatrices of P gives the two relations 


Be = PeT + Bek 
Be = PeU + BQ. 


We may rewrite the second relation as 
Pall — Q) = PsU > 0. 


The proof of Theorem 5-10 translates directly into a proof for row 
vectors. From it we find 


and 


Ba = yN +p, 
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where y = (I — Q), N = > Q", and p is non-negative and Q-regular. 
Hence 
Pe = yN = (BzU)N. 


Thus PE = sT + BeUNR < Be + eR < Be. 
Lemma 6-8: No finite null chains exist. 


Proor: We have L™1 = 1 or >, LP = 1 for every n. Since the 
limit of a finite sum is the sum of the limits, 


(Z1) = > Ly => lim LP = lim > LP =, 
j j n ny j 


Theorem 6-9: Every recurrent chain P has a positive finite-valued 
regular measure a which is unique up to multiplication by a scalar. 
Furthermore, «1 < œ if and only if P is ergodic. 


Proor: Order the states by the positive integers, let Æ be the first 
n of the states, and let F be the first n + 1. Then PF and PF are 
ergodic chains and have regular measures «ë and a. Also (P*)* = PF. 
Thus «f is P#-regular by Lemma 6-7, and we may choose a” such that 
af = aë by the uniqueness part of Proposition 6-4. The procedure of 
adding a single state to F may be continued by induction, and we set 
a = limz.s(«® 0). Now for any of these sets E we have 

aT < aT + a,UNR = a,P* = a; 
or 
ArT < ap. 


Thus, 


T 0 
e( ) = toot 0) < (a, 0) <a. 
0 0 


T 0 
As E — S, the entries of increase monotonically from zero to 
0 0 


the entries of P. Hence, by monotone convergence, «P < « and, by 
Proposition 6-4, «œ is regular. Clearly « > 0, and we know that if P is 
ergodic then a1 < œ. Conversely, suppose «1 < œ. Then, by 
dominated convergence, 


aL = limaL™ 
n 


= lim Žal +--+ Pr?) 


#0 
and L # 0. 
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The proof of Proposition 6-4 is somewhat artificial without further 
explanation. What was used was a standard method of converting 
proofs about functions into proofs about measures. We had proved 
uniqueness for non-negative finite-valued superregular functions, and 
the idea was to take advantage of this fact in the result for measures. 

The isomorphism that exists between row vectors and column 
vectors is known as duality. Not only does duality make rigorous the 
correspondence between row and column vectors, but also it provides 
easy proofs of some new results. 


Definition 6-10: Let P be an arbitrary Markov chain transition 
matrix and suppose there exists a positive finite-valued P-superregular 
measure a. The a-dual matrix of P is a matrix P defined by 

patti 
ij Oj 


Let D be a diagonal matrix with diagonal entries 1/œ;. 


We note that Ê = DPTD-}. 

We cannot define duality in general, because we are not always 
assured of the existence of a positive superregular measure. However, 
there are only two important special cases, and we know that a 
superregular measure « exists for each of them: 

(1) Pisrecurrent. Then there exists a unique a-dualof P. We call 
P the dual of P or the reverse chain. We shall investigate the prop- 
erties of the reverse chain in some detail in Section 8. 

(2) P has only transient states. Then, as we saw in Section 5-2, a 
positive superregular a exists. All duality statements are relative to 
such a vector a, but there is no assurance that a is unique. 


Proposition 6-11: If P is a transition matrix, then so is P. _ If all 


pairs of states in P communicate, then all pairs of states in P com- 
municate. 


Proor: It is clear that P > 0. For P1 < 1 we have 


< P., 1 
2f = > = z — > Pa < =o = 1. 


tj 


If i and j communicate in P by the routes 


i, Mi, Mg,..-, Mp J 
and 
J, My, Nas» + +5 Nsy Ts 
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then they communicate in P by 
i, ngs oe M1, J, 


Js Mps- -3 My, È. 


Proposition 6-12: If P is a transition matrix, then 
Pr = D(P*D-!} and {M{a,J} = D(M[n,}}7D-*. 


If P is either (1) recurrent or (2) transient with only transient states, 
then P is of the same type. If P is of the second type, then 


N = DNTD-}. 


Proor: The proof of the first assertion is by induction on n. The 
case n = l is Definition 6-10. Suppose that 
Pprk-i = D(P*-1)TD-~), 
Then 
Pr = PPr- = (DP™D-1)(D(P*-1)7D-}) = D(P*)?D-?. 
Associativity holds because all the matrices are non-negative. Now 


{Maj} = 2r =D 2 ((P*)?)D-2 = DM [n,}}7D-}. 


In particular, if M,[n,] is infinite, then so is M,[{fi;]. Hence, by 
Proposition 6-11, if P is recurrent, so is P. 


Definition 6-13: Let P be a transition matrix, and let a be a positive 
finite-valued superregular measure. Let Y be any square matrix, let 
B be any row vector, and let f be any column vector all indexed by the 
set of states. Define D to be a diagonal matrix whose diagonal entries 
are 1/a;. The duals of Y, 8, and f are defined by 


dual Y = DYTD-! 
dual 8 = DBT 
dual f = f7D-?}. 


The dual of a number is that number. 


We see that the dual of a row vector is a column vector and that the 
dual of a column vector is a row vector. The reader should note that P 
is identical with dual P and that part of the content of Proposition 
6-12 is that M,[n,] transforms to the Ê chain in the same way that P; 
does: 


{M,[i,]} = dual {M,{n,]}. 


The fundamental properties of duals are listed in the next proposition. 
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Proposition 6-14: Let X and Y be square matrices, row vectors, or 
column vectors indexed by the set of states for a Markov chain P. 
Suppose P is the «-dual of P. Then 


(1) dual dual X = X. 

(2) dual (X + Y) = dual X + dual Y. 

(3) dual (cX) = c dual X. 

(4) dual (XY) = dual Y dual X. 

(5) dual J = J and dual 0 = 0. 

(6) If X > 0, then dual X > 0; and if X > 0, then dual X > 0. 

(7) If X = Y, then dual X > dual Y; and if X > Y, then 
dual X > dual Y. 

(8) If f is a P-superregular (or subregular) column vector, then 
dual f is a P-superregular (or subregular) row vector; and if B 
is a P-superregular (or subregular) row vector, then dual £8 is a 
P-superregular (or subregular) column vector. 

(9) dual1 = « and duala = 1. The measure a is P-superregular. 

(10) Iflim, X™ = X, then lim, dual X™ = dual X. 


Proor: We shall prove only (1) and (4); the rest of the proof is left 
to the reader. For (1) we have 


dual dual X = dual (DX7D~?) 
D(DXTD-~4)?D-? 
DD XT DD-?} 
= X. 


Associativity holds because D and D-t are diagonal matrices. 
For (4) we have 


dual XY = D(x Y)’D-} 
DY?XTD“} 

= (DY'™D~-1)(DXTD~?) 
= dual Y dual X. 


We may summarize Proposition 6-14 by saying that the operation 
dual is its own inverse, it reverses products, and it preserves sums, 
equalities, inequalities, regularity, and limits. We know, for example, 
that the dual of a recurrent chain is recurrent, and since dual is one-one, 
a dual recurrent chain is the most general recurrent chain. Hence a 
proof “for all recurrent chains Ê” is a proof for all recurrent chains. 

The essential feature of duality lies in this last statement; we shall 
apply it to the proof of Proposition 6-4. We start with a recurrent chain 
P and two positive superregular measures « and $. Forming Ê, the 
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a-dual of P, we observe that since P is the most general recurrent chain 
and since 1 is P-regular, the dual of 1 must be P-regular. Thus « is 
regular. Now since $ is P-superregular and non-negative, dual £ is Ê- 
superregular and non-negative. Hence dual £ is a constant vector, and 
the proof that £ is a constant multiple of « is complete. 

To form the a-dual of the restriction of a matrix, we use the appro- 
priate restrictions of D and D~} which make the matrix products 
defined. For example, write 


EE 
E/T U 
P= ) 
ale o 
dual T = D,T™D,"', 


dual U = D;,UTD,~}, 
dual R = DRT D;!, 


By definition, 


and 
dual Q = D;Q?D;7}. 
Note that 
p l Ê = DPTD-} 
R ô) O 
(o na R'\/Dp7! 0 
(0 p,/\ur | 0 a 
eae D,;R™ Dz} 
~ \D,UTD ae 
so that 
dual T = Î, 
dual U = R, 
dual R = U, 
and 
dual Q = Q. 


To make effective use of duality, it is convenient to know what 
interpretation, if any, the duals of the matrices associated with P have 
in terms of the P-process. At this time we shall calculate the duals of 
EP, PE, and B®. 
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If we let £P be the process P watched until it enters E, then FP has 
transition matrix 


and fundamental matrix 


Hence dual =P = =P and dual EN = EÑ. 
The duals of PE and B® are not so trivial to settle, and we shall 
state what they are as the next two propositions. 


Proposition 6-15: The dual of P? is P®. 


Proor: By Lemma 6-6, 
PE = T + US OYE, 
Hence 
dual PE = dual T + (dual R)(> (dual Q)")(dual U) 


=? + OF Q) 
= PF, 
Proposition 6-16: 


EN,, if ieE 


0 if i¢ E, 
where = y is the number of times that the a-dual process started at i 
is in j before returning to Æ. 


Proor: Let N = > Q". Then 


I 0 
BE = ; 
a 
so that 5 
I UN 
dual BE = | } 
0 0 


Thus if i ¢ E, then (dual B¥),, = 0, and if i and j are in Æ, then 
(dual BE); = 8, = EN. 
Ifi e E andj ¢ E, then the result that 
(ÛN); = EN, 


follows from Theorem 4-11 with the random time identically one. 
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If we define EÑ to be a matrix indexed by E and S whose i-jth entry 
is the mean number of times starting in 7 that the process is in j before 
returning to Æ, then we may rewrite the result of Proposition 6-16 as 


dual BF = | ) 
0 


The rest of this section contains applications of Propositions 6-15 
and 6-16. We begin by deriving two identities relating BF to other 
matrices, and we shall then dualize the first identity to obtain a result 
which will be used in Chapters 8 and 9. Finally we shall apply 
Proposition 6-16 in a different way to get a probabilistic interpretation 
for a,/c;. 

Proposition 6-17: For any set F, 

I- PE 0 
(I — P)BE = | ) 
0 0 
TIfNM =I + P+.--.+ P”, then 
BEN™ = NM + EN(prtt = I). 


Proor: Set N = > Q". For the first identity we have 


I-T -U\/I 0 I-(T+UNR) 0 
y We (ae ae | 
-R I-@Q/\NR 0 -R+(1-Q)NR 0 


I- PF 0 
= ( ) since (I — Q)N =I 
0 0 
For the second identity, we have 
0 
EN = 
0 ON 
and hence 
0 0 
moot] 
NR N- 
and 
0 
EN(P —I)= = BE-T 
NR -I 


Therefore =N(P"+! — I) = [EN(P — I)|N™ = (BE — DN”, 
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Proposition 6-18: For any set E 


EN I- PE 0 
(I — P)= . 
0 0 0 
Proor: Apply duality to the first identity of Proposition 6-17, using 
Propositions 6-15 and 6-16. Then 


EN i I- PE 0 
Gi es 
0 0 o0 


Since this identity holds for all reverse processes P, it holds for all 
processes. 


Using Proposition 6-16, we can obtain a simple interpretation for the 
ratio «,;/c;. The case in which P is recurrent is of special importance 
because « is unique up to multiplication by a constant. But first we 
prove a more general result. 


Proposition 6-19: Let « be a positive finite-valued superregular 
measure for P, and let Ê be the «-dual for P. Then for any set F, 


Proor: By Proposition 6-16, 


EÑ, 
(dual ÊE); = i 
uy 0 


for te E 


for i ¢ E. 
Therefore 


dual Á? = dual (71) = «(dual Ê?) 


Il 
M 
g 
= 


Corollary 6-20: Let « be a positive finite-valued superregular measure 
for P, and let P be the a-dual of P. Then 


ProoF: Set Æ = {i} in Proposition 6-19. 


In particular, ‘N,, < a/a; for any such a. 
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Corollary 6-21: Let « be a positive finite-valued regular measure for 
a recurrent chain P. Then 


iN y= a] Œi. 


Proor: Since P is recurrent, so is Ê. Thus Aj, = 1, and we may 
apply Corollary 6-20. 


Corollary 6-22: Let « be a positive finite-valued regular measure for 
a recurrent chain P. Then 


> a FN, =o, for all j. 


J 
ieE 
Proor: Apply Proposition 6-19. Then hE =; 


Definition 6-23: Let P be a recurrent chain. Set 


M S M{t,] 
and 
M; = Mit]. 


The matrix M is called the mean first passage time matrix. Similarly 
M,, is the mean time from 7 until # is reached, and M,, is the mean 
time from 7 to return to Æ. 


Proposition 6-24: If P is a recurrent chain with positive regular 
measure «, then 


> aM i, = al. 


ieE 


Proor: We have 


the next to last equality following from Corollary 6-22. 


Proposition 6-25: If P is a recurrent chain with positive regular 
measure a, then 


z - if P is ergodic and «1 = 1 
i = i 


+oo if P is null. 


Proor: Set E = {i} in Proposition 6-24. 


144 Recurrent chains 
3. Cyclicity 


Let P be a recurrent Markov chain and let 7 be a fixed state of the 
chain. Define a set of positive integers T by 


T = {k | (Pju > 0, k > O}. 


Let d be the greatest common divisor of the integers in T. 
Lemma 6-26: T is non-empty and is closed under addition. 


Proor: T is clearly non-empty since P is recurrent. Suppose m and 
n are integers in T. Then (P™), > 0 and (P"), > 0, so that 


(Per) = 2 (P™)ic( Pict 


= (Pil Pi 
> 0. 
Hence m + nis in T. 


Noting the discussion in Section 1-6e, we arrive at the following 
result, using Lemma 1-66. 


Lemma 6-27: T contains all sufficiently large multiples of its greatest 
common divisor d. 


The integer d we shall call the period of the chain for the state 7. 


Proposition 6-28: The period of a recurrent chain for the state 1 is a 
constant independent of the state i. 


Proor: Let i and j be any two states in the chain. Since the chain 
is recurrent, 7 and j communicate. Let d be the period associated with 
state i and let d be the period associated with state j. Suppose the 
minimum possible time for the process to go from state t to state j is s, 
and suppose the minimum time for the process to go from j to 2 is t. 
By Lemma 6-27 let N be large enough so that the process can return to 
jin nd steps for alln > N. Then the process can go from t to j in s 
steps, return to j in Nd steps, and go back to i in t steps. Hence 
d|(s + Nd +t). Similarly, d | (s + (N + 1)d + t). Thus d divides 
the difference, or d | d. Reversing the roles of i and j, we find that 
d|d. Therefore, d = d. 


We may thus speak of the period of a recurrent chain without 
ambiguity. Every recurrent chain has a period, and that period is 
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finite. If, for example, P is a recurrent chain in which Pa > 0 for 
some îi, then P has period one. 


Definition 6-29: A recurrent chain is said to be non-cyelic if its period 
is one and cyclic if its period is greater than one. 


Let P be a recurrent chain of period d. Define a relation R on the 
states of P by the following: We say that i R j if and only if, starting at 
i, the process can reach j in md steps for some m. From the definition 
of the period d, it follows that i Ri. The symmetry of R follows from 
the fact md pius the time to return from j to i must be a multiple of d. 
To see that R is transitive, we note that if j can be reached from 7 in md 
steps and if k can be reached from j in nd steps, then k can be reached 
from t in (m + n)d steps. 

Thus R partitions the states into cyclic subclasses. The reader may 
verify that there are d distinct subclasses and that the nth class contains 
all those states which it is possible to reach from the starting state only 
at times which are congruent to n modulo d. The process moves 
cyclically through the classes in the specified order. Furthermore, if 
the chain is watched after every dth step, the resulting process is again 
a Markov chain (by the strong Markov property), and its behavior will 
be noncyclic. The transition matrix for the new process is P?, and its 
form is that of d separate recurrent chains: 


P? = . a d blocks. 


a= 


The entries in each block are the entries of a recurrent noncyclic chain, 
and the entries which are not in any block are all zeros. 

The observation that P* is really d separate recurrent noncyclic 
chains enables us to study representatively the properties of all re- 
current chains by considering only noncyclic chains. Thus, it is to 
noncyclic chains that we now turn our attention. The main tool 
in their study will be chains representing sums of independent random 
variables. 
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4. Sums of independent random variables 


We have already investigated some of the properties of sums of 
independent random variables Markov chains. Such processes are 
especially important because of how they arise from general recurrent 
chains (see Proposition 6-32), and it is for this reason that we now 
discuss their origin. 

For concreteness we shall confine ourselves to sums of independent 
random variables chains defined on the integers. Before recalling the 
definition of independent random variables, we remark that if y is a 
real-valued function defined on a probability space and having a 
denumerable range, then a necessary and sufficient condition for y to 
be measurable (and hence to be a random variable) is that the inverse 
image under y of every one-point set be measurable. The condition is 
necessary because {w | y(w) = c} = {w |c < y(w) < c} must be measur- 
able, and it is sufficient because {w | y(w) < c} is a countable union of 
such sets. Therefore, if y is a denumerable-valued random variable 
and if # is an arbitrary set of real numbers, the set {w | y(w) € E} is 
measurable. 


Definition 6-30: The denumerable-valued random variables y,, yo, 
Ys, - - - defined on Q are independent if, for every finite collection of sets 
E, Ez, ..., Em of reals, it is true that 


Pry, (w) € Ep for k = 1,..., m] = [ [| Prly,,(w) € Ex]. 
k=i 


The random variables are identically distributed if. for any m and n and 
for any set E of reals, it is true that 


Prlyn(w) € E] = Prly,(w) € E]. 


An independent process {y,} was defined in Section 2-5 as one in 
which the statements Yọ = Co A'A Yn-1 = Cn-1 and y, = C, are 
probabilistically independent for every n > 0 and for every choice of 
the c’s. We see that an independent process is that special case of a 
Markov process in which Pr,[y,., =J | Yn = i] is independent of i. 
Moreover, an independent process is a Markov chain if and only if it is 
an independent trials process. 


Proposition 6-31: Let {y,} be a stochastic process defined from a 
sequence space {2 to a denumerable set of real numbers S. The 
stochastic process is an independent process if and only if the {y,} 
are independent random variables. It is an independent trials process 
if and only if the {y,} are independent and identically distributed. 
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Proor: We are to prove first that the {y,,} are independent if and only 
if the statements yy = Co A---A Yn-1 = Cn-1 and y, = C, are 
probabilistically independent for any n > 1 and for any choice of the 
c’s. Independence of the y’s means 


n 
Prlyo = Co A+++ A Yn = Cn] = I] Prly, = cx] 
k=l 


and 
n-1 
Pr[yo = Co A+++ A Yn-1 = Cn-il = II Pry, = cx] 
k=l 


for all n. This statement holds if and only if 
Prlyo = Co A+++ A Yn = Cn) 
= Prlyo = Co AtA Yn-1 = Cy -1|Prlyn = Cy] 


for all n. Second, we are to prove that the {y,} are also identically 
distributed if and only if 


Prly, m Cn] = Prl¥m in Cm] 


for any n and m. But this assertion is clear from Definition 6-30. 


Let {y,} for n > 0 be a sequence of independent random variables 
which are identically distributed for n > 1 and which have range in the 
union of the integers and {—0o, +00}, and define inductively 


Zo = Yo 
and 
Ln+1 =Yn+1 + En for n = 0. 


If the y, are finite-valued a.e., we claim that the random variables x, 
are the outcome functions for a sums of independent random variables 
process on the integers with starting distribution m; = Pr[yy = t]. 
Setting 

Pr = Prly, = k], n> 0, 


which is a constant not depending on any other function in the sequence 
(by independence) and not depending on n (by identical distributions), 
we see that > p, = 1 since y, is finite-valued a.e. Moreover, if 
Pr[zy =@A+++A %-1 = 1] > 0 with n > 0, then 
Priz, = j| £o =a A a, =OA---A £a = À] 
= Pry, =j- i|yo=a ^y =b-a ntt Yn- =t — h] 
= Priy, =j z i] 
= Pj-i- 
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Hence the process is a Markov chain representing sums of independent 
random variables. 

Conversely, let P be the transition matrix for a sums of independent 
random variables Markov chain with state space the integers and with 
outcome functions z,. Let y, = £, — %,-;forn > 0. We shall show 
that x) and the y, are independent and that the y, are identically 
distributed; it is clear that x, and the y, are finite-valued a.e. In fact, 
we have 


Pr [£o = co A (Yk = & for 1 < k < n)j 
= He Pcs cy tatt Po +e +C- 1Co te Hen 


= Meg Pei’ tt Pen 
n 
Pr,[%> = Col: [| Priye = c) 
k=1 


and independence follows by taking countable disjoint unions of such 
statements; since 

Pril¥n = J] = P; 
the y, are identically distributed. 

Sums of independent random variables appear in a natural way in 
the study of recurrent chains. The result to follow associates to every 
recurrent chain P a sums of independent random variables chain P* 
with state space the integers. 


Proposition 6-32: Let P be a recurrent chain with outcome functions 
x, For a fixed state s let t,(w) be the (n + 1)st time on the path w 
that state s is reached. Then the random times t, for n > 0 are the 
outcome functions for a sums of independent random variables ladder 
process P* with state space the integers. 


Proor: If Pr, [tp = Co A+++ A ta-1 = Cn-1] > 0, then 
Pr,{t,, = Cy | to = Co As A try = Cn-1] 
= Prate 41 FSA A Me 1 F EAN, = 8 (ty As 
Art A Eeo- FSA Le, =S AZo FS 


Av A Zoa, = 8] 


Pr[zy AS A+++ A Xo, = s] by Theorem 4-9 


Cn -1 
= Prt, = Cy — Cy-1]; 
where t, is the time to return to state s. Hence 


Prat, =jltp =@A---A ty, = i] = Fo. 
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Set P% = F%Y-. Since by recurrence of P 
(P*1), = Pa! => Fg” = Hy = 1, 
7 
P* is a sums of independent random variables Markov chain. 
The following is a converse to the preceding result. 


Proposition 6-33: Let P’ be a sums of independent random variables 
ladder process on the integers. There exists a recurrent chain P and a 
state s such that the times to return to s are the outcome functions 
for P’. 


Proor: Let pẹ = Po, for k > 1; then $£- p, = 1. We take P to 
be a basic example and s to be state 0; the values of p; and q; in the 
basic example are yet to be specified. Define recursively the q;s by the 
relations 

Pi = % = Bo — Pi 
and 
Pn = Pr-++Pn-19n = Bn-19n = Bn-1 — Bn 


In P we have Pr[t, — tọ = k] = p, as required; it remains to be proved 
that P is recurrent. We have 


È m= $ Eea Be) = Po Py = 1 — fn, 


and since >?_, p, = l, we must have lim, 8, = 0. Hence P is 
recurrent., 


We close this section with two remarks about sums of independent 
random variables and their relation to recurrence. First, we have seen 
in Proposition 5-22 that a sums of independent random variables 
process on the integers with finitely many k-values is a recurrent chain 
if and only if the k-values have mean zero and their greatest common 
divisor is one. Second, we note that an infinite recurrent chain 
representing sums of independent random variables must be null, since 
«œ = 17 is regular and 171 = œ. 


5. Convergence theorem for noncyclic chains 


By restricting our attention to noncyclic recurrent chains, we can 
prove a stronger result than the Mean Ergodic Theorem, namely that 
P" itself converges with n to a limiting matrix. We shall give two 
proofs of this convergence theorem—the first a matrix proof using sums 
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of independent random variables and the second the classical proof 
using the Renewal Theorem of Section 1-6f. We shall further show 
that, conversely, the truth of the convergence theorem just when P is 
the basic example implies the full validity of the Renewal Theorem. 

We begin by proving two lemmas needed in both proofs of the 
convergence theorem; their effect is to formulate noncyclicity in a 
number-theoretic way. 


Lemma 6-34: For any Markov chain and any states i and j, 
P P = ô; 
FO = 0, 
and 


n n-1 
PP = > FPP- = X PYF forn >O. 
k=1 k=0 


Proor: The first two statements are obvious; for the third we first 
note that if Prt, = k] > 0, then for n > k, 
Priz, = j | țẸ = k] = Prix, = j| te =j A Er- FIN NMFS 
= Priz, = j | £y = j] by Lemma 4-6 
= Pr [z -y = j] by Lemma 4-6 
= Pg=». 


Hence no matter what the value of Prt, = k], it is true that, for 
n> k, 


Pr{t; = k] Prz, = k |t; = k] = FRPS-. 


Using x, = j At, = k,l < k < n, as a set of alternatives for x, = J, 
we have 


n 
PP = > Prii, = k] Prie, =j|t = k] 
k=1 
- $ Apps» 
k=1 
n-1 


= PY®F2-) by a change of variable. 
0 


x 
Mt 


Lemma 6-35: A recurrent chain is noncyclic if and only if the set 
Z = {k | FP > 0} has greatest common divisor one. 


Proor: If Z has greatest common divisor one, then the period for the 
state i is one. Conversely, suppose that the greatest common divisor 
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isc. We shall show that c divides d, the period of the chain. Hence, 
if c > 1, the chain is cyclic. Let n be the smallest integer for which 
PP >0 and ctn, and write n=qe+r with O<r<c. By 
Lemma 6-34, 

Pp = > Rppg-» 


k=1 


q 
i Fic) Pa - fe + 
= > Hoe es 


Then the right side is zero since every term P{(¢-¢*” is zero, a contra- 
diction. Thus PP > 0 only if ce | n. 


The next two lemmas lead to the convergence theorem; the first one 
is a consequence of Proposition 6-32 and the zero-one law for sums of 
independent random variables. 


Lemma 6-36: Let P be a noncyclic recurrent chain, and for a fixed 
state s let E and F be any two sets of integers whose union is the set of 
all non-negative integers. Then either 


Pr,{[z, = s for infinitely many ne E] = 1 
or 


Pr,[zx, = 8 for infinitely many ne F] = 1 (or both), 


and whichever alternative holds is independent of the starting 
distribution v. 


Proor: Form the process P* of Proposition 6-32. We shall first 
show that for any two states i and j there is a state k which it is possible 
to reach from both i and j; for this purpose it is sufficient to show that 
from state 0 it is possible to reach all sufficiently large states, since P* 
represents sums of independent random variables. Now the set of 
states which can be reached from 0 is non-empty and is closed under 
addition (since P* represents sums of independent random variables); 
its greatest common divisor is one by Lemma 6-35. Hence by Lemma 
1-66 all sufficiently large states can be reached. 

By the zero-one law, which is Propositions 5-19 and 5-20, 


Prt, c E infinitely often] = Pr,{z, = s for infinitely many n € E] 
is zero or one and is independent of i. Thus 


Pr,[z, = s for infinitely many n € E] 
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is zero or one and is independent of m. If 

Pr,[z, = s for infinitely many n € E] = 0 
and 

Pr,[z, = s for infinitely many n e F] = 0, 
then 

Pr,[z, = s infinitely often] = 0, 


in contradiction to the recurrence of P. 
The following lemma uses the notation ||] = >, |f;|. 


Lemma 6-37: Let P be a noncyclic recurrent chain, and let 8 and y be 
probability vectors. Then lim,.,,. ||(B — y)P"|| = 0. 


Proor: Let E = {n | (BP), < (yP)}and F = {n | (BP"), > (yP"),}. 
By Lemma 6-36, either 
Pr,[z, = s for infinitely many n e £] = 1 
or 
Pr,[z, = s for infinitely many ne F] = 1, 


and by symmetry we may assume that the former alternative holds. 
Let h™ be the statement that £m = s for some me E with m < n, 
and let 
BY = Prale, =k a ~h™). 
Then 
|| = BO = Prah] > 0 
by the above assumption. Also 


po = 


> BOP, ifnẹE 
porn 5 


> PPP, ifneBE. 


J#S 


and 


We may represent this last identity conveniently by using e, a row 
vector such that «, = 6,,, and by defining 


Me j 
Ais i ce ifneE 


0 otherwise. 
Then 


Bee is (Bm = 8™)P. 
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Next, we define 
y Ci p™ + (y = pB) P". 
From this relation, 
yr-DP = B-PP + (y — p) P” 
and 


(0) — 
y = 


and hence 
snr) = yP + party =a pmP = (yy? = &™)P. 


We shall show by induction that y > 6”. First, y = y z 0. 
If 0¢ E, then 8 = 0 < y”. Ife EH, then y® = y, = B, = BS by 
the definition of E, and B® = 8. Thus y > 6 in either case. 
Suppose y*-P > 6"). Then 


y™ = (yP Z 6%-D)P > 0. 


If n ¢ E, then 8 = Oandy™ > 5”. Ifne E, then [(y — B)P"], = 0 
by the definition of £, and hence 


(n) n) — §(n) 
Ys Z BS = 8s 


by the definition of y. Thus y® > 8” for every n. 
In particular, we have y™® > 0. Thus 


Il = y 
= B+ [ly — B)P 
= B® + (y — BPM) 
= Bm + (y -P 
= po 
= |IB™|| 0. 
Finally 


I — yP" = lib — yl 
by the definition of y™, and the right side is 


< [BI + lly ll > 0. 


Theorem 6-38: If P is a noncyclic recurrent chain, then lim, o P” 
exists. If P is ergodic, then lim P" = A = 1c and lim, ||7P" — «l| = 0 
for every probability vector 7. If P is null, then lim, (7P") = 0 for 
every probability vector r. 


154 Recurrent chains 


Proor: Every recurrent chain is either ergodic or null; taking m to 
be a vector with 1 in the ith entry and zeros elsewhere, we see that the 
existence of lim P” follows from the other assertions of the theorem. 

Let P be ergodic and let A = 1a be the Cesaro limit of the powers of 
P. We have aP” = a for every n. Letting B = a and y = « in 
Lemma 6-37, we obtain the desired result. 

Let P be null and suppose the assertion of the theorem is false. 
Then by Corollary 1-64, for some probability vector 7, there is an 
increasing sequence {n,} of positive integers and there is a row vector 
p # 0 such that 


lim (mP"x); = p, for every state 2. 
k 


Certainly p, = 0. Summing on i, we obtain 

pi = 2 Pi = >, tim (7wP"), < De (7wP™), = 1, 
the inequality following from Fatou’s Theorem. Applying Lemma 
6-37 with B = v and y = rP, we see that 


lim (7P™%*+), = p, for each i. 
k 


By Fatou’s Theorem, 
pP = (limawP™)P < limawP%+t} = p. 


Hence p is non-negative superregular and satisfies p1 < œ; p must be 
regular by Proposition 6-4, and the fact that P is null then contradicts 
Theorem 6-9. 


Corollary 6-39: If P is a null chain (not necessarily noncyclic), then 
lim P” = 0. 


Proor: Let P have period d. By Theorem 6-38 
lim P" = 0. 


By dominated convergence, lim, P"**” = 0 for each r = 0,1, 2,..., 
d — 1. Hence, 


lim P” = 0. 
n 


The classical proof of Theorem 6-38 that follows proves only that 
lim P” exists. 
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SECOND PROOF OF THEOREM 6-38: We first prove the theorem for the 
ith diagonal entry. Set fa = F%, u, = (P*),, and 
B= > fn = > nFp T > n Prii, = n] = Mit] = My. 
n n n 


Lemmas 6-34 and 6-35 establish all the hypotheses of Theorem 1-67 
except for the fact that >, f, = 1: 


fa = FPP = Hy = 1. 
Dh = DFP =H 


Therefore u,—> l/u or 9 according as u is finite or infinite, and the 
value of the limit for the diagonal entries follows from Proposition 6-25. 
For the off-diagonal entries, let i and j be any two distinct states. 
Define a row vector £ and a sequence of column vectors {g} by 


Bm ra Fp 
ana (P"-™),, ifn =m 
7 0 ifn <m. 


Then lim, g = L, exists since we have proved the theorem for diag- 
onal entries. Furthermore, by Lemma 6-34, 


(P's = 2) BaP" 


> Bug? 
m=1 
= Bg”. 


Since 61 = 1 and g™ < 1, the Dominated Convergence Theorem 
applies and 
lim (P”); = lim Bg 


= f lim g™ 
n 


= B(L;;1) 
= Li; 


As a converse to the second proof of Theorem 6-38, we shall show that 
the convergence of P” for every noncyclic recurrent chain implies the 
truth of the Renewal Theorem. This result is of particular interest 
because all that is needed is convergence of the diagonal entries of P”, 
when P is a noncyclic recurrent case of the basic example. 
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Proposition 6-40: If every noncyclic recurrent chain converges to a 
limiting matrix, then the Renewal Theorem holds. 


Proor: Let the sequence {f,} be given. Define r; = >,5; fk; the 
r; tend to 0 because >), fe = 1. As long as r; > 0, define 


Piss = tu for i = 0,1,2,.... 
i 
If r; = 0 for some i, then p; = 0 and the p, for k > i are irrelevant. 


Set q; = 1 — p; and let the p, and the q; represent the transition 
probabilities associated with the basic example. We have 


r i 
Bi = PiP.. -pi = SS a lr 


That is, 8; = 7, Since r; —> 0, B; > 0 and the chain is recurrent. 
Now 


FR = Bn-1(1 — Pn) 


= Bn-1 = Bn 
= ote 2 te 
= Si 


Thus u = >, nfa = Moo. By Lemma 6-34 we see that u, = PR. 
The Markov chain is noncyclic by Lemma 6-35 because the greatest 
common divisor of the k’s for which fẹ > 0 is 1. Therefore lim u, = 
lim P§Q exists. On the other hand, by Proposition 6-25 the Cesaro 
limit of PR is 1/Mo. = I/wif Moo < œ or Oif Moo = +00. Hence by 
Proposition 1-61 
l/u if p< œ 
lim u, = 
0 if p = +0. 
6. Mean first passage time matrix 
The matrices M and M have already been defined by 
My; = Mit, 
M; = M]. 
In Proposition 6-25 we saw that 
_ I/a, if P is ergodic and «1 = 1. 
M= 


œ if Pis null. 


Proposition 6-41: In any recurrent chain, M = E + PM. 
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Proor: Apply Theorem 4-11 with the random time equal to one. 
Then 


M{t,] = ` P,.M,{t; + 1] 
k 
= X Pi(My5 + 1) 
k 
= > PyMy; + > Pix 
k k 
= (PM), +1. 


For an ergodic chain P we define D to be the diagonal matrix whose 
diagonal entries are 1/o,, where «1 = 1. From Proposition 6-25 we 
see that 


M=D+ M. 


Proposition 6-42: If P is an ergodic chain, the mean first passage 
time matrix M satisfies these properties: 


(1) Mas = Oand M > 0. 
(2) M is finite-valued. 
(3) I — P)M = E — D. 


Proor: The first statement is obvious, and the third follows im- 
mediately from Proposition 6-41 and the identity M = D + M if we 
can show that M is finite-valued. The problem thus reduces to proving 
(2). We know that M, = 0; therefore let ¿ and 7 be distinct states. 
We shall show that M, is finite. Let t = min (t, t,). Then 


1 _ 2 
a Ma = Mt,] 
i - 
> Pra, = JM{t, | x = i] 
> Pra, = 7]M,,; by Theorem 4-11 


1H aM. 


If we can show that /H,, > 0, we will have 


But 0 < a/a, = Ny, =H, Nua by Corollary 6-21 and Proposition 
4-15, so that ’H,, > 0. 


The remarkable fact about the mean first passage time matrix M for 
ergodic chains is that the converse of Proposition 6-42—namely 
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Theorem 6-43—is also true. Thus once a candidate for M has been 
found, even by guessing, we need check only that it satisfies (1), (2), 
and (3). 


Theorem 6-43: If P is an ergodic chain, the mean first passage time 
matrix M is characterized by these properties: 


(1) My, = Oand M > 0. 
(2) M is finite valued. 
(3) J — P)M = E — D. 


Proor: Proposition 6-42 shows that M satisfies these properties. 
Let Y be any matrix for which (1), (2), and (3) hold. Let 0 be an 
arbitrary fixed state of the chain. It is sufficient to show that y, the 
zeroth column of Y, is equal to m, the zeroth column of M. Forming 
the chain °P, in which state 0 has been made absorbing, and writing 


100... 
op = ( ) and y= ("" and m = "e: 
Q y M 


we see that m = {M,a]} and by Corollary 5-17 that m is the minimum 
finite-valued non-negative solution of the equation (J — Q)ē = 1. 
We first show that Ẹ is another finite-valued non-negative solution. 
We know that 7 is finite-valued and non-negative by hypothesis. The 
identity (7 — P)Y = E — D yields in the zeroth column 


( e E \(") [b= Ueto 
I-Q)\J ae oy 
But yo = 0 so that (J — Q)j = 1. We conclude that 7 > m. Since 
Yo = M = 0, we have y > m. Hence 
U — P)(y — m) = (I — Ply — (I — P)m=0 


and y — m is a finite-valued non-negative P-regular function. Thus 
y — m = c1 by Proposition 6-3. Looking at the zeroth entries, we see 
that 0 = yo — mo = c. Therefore, y = m. 


7. Examples of the mean first passage time matrix 


In this section we shall compute the mean first passage time matrix 
associated with two infinite recurrent chains. The first example is a 
reflecting random walk, and the second is the basic example. 


EXAMPLE 1: Reflecting random walk. 
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A random walk on the non-negative integers is defined by the transi- 
tion probabilities 


Po = q, Whereg # Oandg #1, 
Piss = p=1l-—q friz 0 
Pyi-1 = q for i > 0. 


We note that the process P with 0 made absorbing is the infinite 
drunkard’s walk P’. For the present chain we have 


Foo = pHyo + GHoo = PH + q. 


But Ho is the absorption probability B,, for the infinite drunkard’s 
walk. And Bo = lifp < q, and Bio < lifp >q. Therefore 


=1 ifp<q 
<1 ifp>q, 


and P is recurrent if and only if p < q. 
A similar relation holds for M ọọ; we have 


Moo = 1 + pM +49Moo = 1 + pMyo. 


Since M,, is the mean time to absorption M,[a]in the P’ chain, we see 
that M oo is finite if and only if p < q. That is, 


transient if p >q 
P is< null ifp=y 
ergodic if p <q. 


The chain is never cyclic, since Poo > 0. 
We shall compute M for the ergodic case. Let r=gq/p> 1. A 
P-regular measure « must satisfy « = «aP, or 


ao = AY + 1g 
ai = %_;p + ag for i> 0. 


From the first equation we obtain a, = «9/7, and from the second, which 
is a second-order difference equation, we obtain 


a = A+ Br fori > 0. 


From the two equations we finda, = A + Banda)/r = a, = A + Blir. 
Therefore, A/r = A, and since r > 1, we must have A = 0. Choosing 
B so that a1 = 1, we arrive at the result 


a, = (1 — I/r)jr-t. 
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The entries M, of the mean first passage time matrix are easily 
found once the value of ag is known. Letting m be the zeroth column 
of the M matrix, we see from Proposition 6-42 that 


(I — P)m =1 —- {8i9/a} 
or that 


Mo — Mo — pm, 1 — Ifa 


and 

mMm; — qMmi-ı — Pmi, = 1 fori > 0. 
Since a) = 1 — p/q, we have 1 — l/æ = —p/(q — p). The fact that 
Mo = 0 then reduces the first relation to 


—pm, = —pl(q — p), 
so that 
m, = 1/(q — p). 
The difference equation for m, has as a general solution 
m, = A + Bqg/p)' + i/(q — p) for i = 0. 
The boundary conditions on m, and m, imply that A = B = 0 and 
that 
M = 1/(9 — p). 
The computation of Mo; uses the same general methods. First, we 


note that if 7 < j, then the process must pass through 7 from 0 to get 
to j. Hence 


My, + Mi; = Mo; 


or 

M; = Mo; = Moi 
Now 

My =p + Ql + Mo) 
so that 


Mo; = 1/p. 
For 0 <i < j, 


M; 


pP + Miia) +4 + Mii.) 
or 

My; — Mo = 1 + p(Mo; — Moisi) + (Mo; — Moi-1). 
Thus for i > 0, 


PM oi+1 — Mo + GMoi-1 = 1, 
and for i > 0, 


PM ois2 — Moisi + Moi = 1. 
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Solving this equation and using the relations My, = 0 and Mo; = 1/p, 
we find 
q l i 
My; = ose (r — 1) — . 
«= gpl EPET 
Algebraic manipulation yields the alternate formula 


] ica 
Moi a > rk(i = k). 
P k= 
Since Mio = M; + Mj. when i > j and since My, + My = Mo; 
when i < j, we may summarize our results as follows: 


(ij 
q-P 
m=. 


q j-t ae, 
[E a E if i < j. 
(a — p}? q- p 


ifizj 


EXAMPLE 2: Basic Example. 


The vector 8 defined for the basic example has the property that 
BP = ß if and only if lim,.,. 8; = 0. Furthermore, P is recurrent if 
and only if lim; 8; = 0. When P is recurrent, it is null if >, £; is 
infinite and ergodic if >; £, is finite. In the ergodic case the regular 
measure a for which a1 = 1 has entries 


Po 
SA 


Entries M,; of the mean first passage time matrix for the basic 
example satisfy the equations 


Mo; + M; = Mo; for 4 <j 


a= 


and 
M; = Mio + Mo; for t >j. 
Since 


Mo + Ma = Hy = 2 for i > 0, 


it is sufficient to compute M); for the chain. Taking the statements 
{the process moves from 0 to k < i — 1 and then to zero, the process 
moves directly to 7} as a set of alternatives, we find that 


i-1 
Mo, = Bå + > Bideri(k + 1 + Moi). 
k=0 
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Solving this equation with the aid of the relation B.g.41 = Be — Bkr 
we obtain 


1 
Ma => ‘ 
Oi Bı 2 Bx 


Therefore, for i > 0, 


1 
Mio ~ 8B, >, Be: 


i kzi 
The general entries may be computed from 


M; = Moy — Mo; ifi<j 


8. Reverse Markov chains 


Let {x,} be the outcome functions for a denumerable Markov process 
defined on a space Q and with range in S. The outcome functions 
appear in a certain order and represent the forward passage of time. 
One may well wonder, however, if, when the functions are looked at in 
reverse order, the process is in any sense still Markovian. It is the 
purpose of this section to discuss this question; as a by-product of the 
discussion, we shall gain an interpretation for the dual of an ergodic 
Markov chain. 

The sense in which a Markov process reversed in time is still a Markov 
process is the following. 


Proposition 6-44: Let {xz,} be a denumerable Markov process and let 
N be a fixed positive integer. Define y, = ry_, for 0 < n < N and 
Y, = “stop” for n > N. Then the functions y, are the outcome 
functions for a denumerable Markov process with the same state space 
with “stop” adjoined. 

Proor: We shall show that the functions y, satisfy the Markov 
property. Clearly, this needs to be checked only for n < N. If 
Pr[yo = Co A+++ A Yn-1 = Cn-i] > 0, then 


Prlyn = Cy | Yo = Co Att A Yn -1 = Cn-1] 
= Pr[ty-n = Cp | 8y = Co Atti A Ey-nti = Cn-il 
_ Pr[£y-n = Cn A Ey-n+1 = Cn-1 A+++ A ty = Co] 
O Penans = Ona Av Ay =O] 
The numerator is 
Pr[£y-n = Cn A Ey-n+1 = Cn-a] 
x Pr[£y-n+2 = Cn-2 | y-n = Cn A Zy-n+1 = Cn-al 


x cee Belay = Co | y-n = Cn Act A yee = G), 
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which by the Markov property is 
Pr[£y-n = Cn A y-n+1 = es A E E ae Paro 
The denominator similarly reduces to 
Pr[£y-n+1 = Cral Poania: A s Porco 
Dividing, we obtain 
Pr[yn = Cn | Yo = Co Atit A Yn-1 = Cr-1] 
= Pr[ty—n = Cn A Zy-n+1 = Cn-i] 
Pr[£y-n+1 = Cn-a] 

= Pr[£y-n = Cn | Ey-n+1 = Cn-i] 
= Prlyn = Cn | Yn-1 = Cn-1]. 


It is not true in general that, if the original process is a Markov chain 
P, then the new process is also a Markov chain. Let P be started with 
distribution m. We then have, if Pr[y,_, = i] > 0, 


Prly, =, j | Yn-1 = i] = Pr,[%y—n = j | TN-n+1 = 1] 
E Pr[t%y-n =J A ty-nvi = 4] 
Prilty—-n+1 = i] 
(mP~-”); -Pi 
z “(a PY-r+i); ` 
The last quantity on the right need not be independent ofn. Neverthe- 
less, if P is ergodic, there is a case where we can state a positive result— 
a result which gives us an interpretation for the dual of P. 


Proposition 6-45: Let {x„} be the outcome functions for an ergodic 
chain P, let N be a fixed positive integer, and let « be the unique P- 
regular probability measure. If P is started with distribution «, then 
the process {y, = £y-n 0 < n < N} is an initial segment of the 


Markov chain with transition matrix Ê and with starting distribution «. 


Proor: If Pr[y,_, = 7] > 0, then o = aP” = a and 
! ; (aP™-*”);- P; 
Prdy =j | Yn-1 = 1 = aP 


o ot; Pig 


=P ij 
independently of n for n < N. 


The motivation for calling Ê the reverse chain when P is recurrent 
now becomes clear. 
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9. Problems 


1. 


10. 


ll. 


Compute for the Land of Oz example P?, Pt, and P®. What is A = 
lim P"? Show that each row of A is a regular measure and that each 
column is a regular function. 


Let 

123 4 5 6 

1 400440 

270 10000 

pu? 002 0 2 0 
4440000 34 

5\0 0 2 0 2 0 

6 \o 4444 0 


The process is started in state 1. Find the probability of being in the 
various states in the long run. 


i 2 
m= (rF) 


Is the chain transient, null, or ergodic ? 


. In the basic example, let 


. Prove that œ = 17 is regular for any sums of independent random 
g y P 


variables process. Give a careful statement as to the existence of 
transient, null, and ergodic examples. 


. Establish the following relationships between a chain with transition 


matrix P and one with matrix PF: 

(a) If P is transient, then P” is transient. 

(b) If P is recurrent, then PF is recurrent and «ë = cap. 

(c) If P is ergodic, then P# is ergodic, but the converse is not always true. 


. Prove that if a recurrent P has column-sums equal to 1, then Ê = P’. 
. Consider sums of independent random variables on the integers with 


p-1 = 4and p, = 3. Choose two essentially different positive regular 
measures «, and show that each gives a correct expression for ‘N,, in 
Corollary 6-20. 


. Show that if P; > 0 for a single state in a recurrent chain, then the chain 


is noncyclic. 


. Show by an example that M y = «M ala; need not be true. 


Show that in an ergodic chain «M may be either finite- valued or infinite- 
valued. 


Determine whether the following chain is transient or recurrent: 


202 2 
0100 


1 
4 


Coed 
= pm 


5 
© ee 


12. 


13. 


14. 


15. 
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If transient: Put into standard form, and find N, B, and a. 
If recurrent: Is it cyclic? Find «, M, Ê, M. 


Do the same for 


l-c c 
P= <1. 
C o) 0<c<l 


Find « and M for an independent trials process by the methods of this 
chapter, and check your answers by a direct computation. [See Problem 
5 in Chapter 4.] 


(a) Complete the work of finding M for the basic example. 
(b) Find the reverse of the basic example (when recurrent), and compute 
M for this chain. 


A light bulb in a fixture lasts j time units with probability fj. It is 
replaced with a similar bulb when it burns out. Assume that > f; = 1, 
o = 0, and f, > 0. Let x, be the length of time that the bulb in use at 
time n has lasted (taken to be 0 if there is a replacement at time n). 
Show that {x,„} is the set of outcome functions for a Markov chain and 
discuss the connection with the basic example. Show that the prob- 
ability that a bulb is replaced at time n tends to a limit a8 n —> œ. 


Problems 16 to 26 refer to sums of independent random variables on the 
circle. Mark n (n > 3) points on a circle, labeled i = 0, 1,..., n — 1 clock- 
wise. The process either moves one step clockwise with probability %, or it 
moves one step counterclockwise with probability 4. 


16. 
17. 
18. 
19. 
20. 


21. 


22. 


23. 


24. 
25. 
26. 


Prove that the chain is ergodic. Is it cyclic ? 
Find a positive regular measure « with «1 = 1. Interpret it. 
Construct the reverse chain. 
Compute M by means of Theorem 6-43. [It is sufficient to find M,o.] 
Show that for large n, 
Mio ~ 3(n — i — n(3)’). 

Compare this result with the absorption times of a drunkard’s walk on 
{0,1,..., n} with p = 2. 
Show that the approximation in the previous problem is excellent for 
n = 50. 
Use the approximation of Problem 20 to show that the maximum value 
of Mo occurs approximately at 

i= log n 

log 2 

Check this conclusion for n = 50. 
Let n be even, and let Ẹ be the set of even-numbered states. Compute 


For n = 3, compute P?, Pt, P8, and P®. What is the limit of P”? 
Repeat Problem 24 for n = 4. 
Show that «aM = ¢,17, and find an asymptotic expression for cp. 


CHAPTER 7 


INTRODUCTION TO POTENTIAL THEORY 


1. Brownian motion 


One of the fruitful achievements of probability theory in recent years 
has been the recognition that two seemingly unrelated theories in 
physics—one for Brownian motion and one for potentials—are mathe- 
matically equivalent. That is, the results of the two theories are in 
one-to-one correspondence and any proof of a result in one theory can 
be translated directly into a proof of the corresponding result in the 
other theory. In this chapter we shall sketch how this equivalence 
comes about, and we shall see that Brownian motion is a process which 
is like a Markov chain except that it does not have a denumerable state 
space and time does not proceed in discrete steps. The details of this 
equivalence can be found in Knapp [1965]. The important thing to 
notice will be that the definitions of potential-theoretic concepts in 
terms of Brownian motion do not depend on isolated specific properties 
of the process but depend only on the Markovian character of Brownian 
motion. In other words, there is reasonable hope of defining for an 
arbitrary Markov chain a potential theory in which analogs of the 
classical theorems hold. 

We begin by describing Brownian motion. In 1826 the botanist 
Robert Brown observed that microscopic particles, when left alone in a 
liquid, are seen to move constantly in the fluid along erratic paths. 
Much later Albert Einstein investigated this movement of particles 
from a theoretical point of view. Einstein was able to derive statistical 
laws which estimate how a large number of particles spread over a 
period of time, and his predictions were verified. 

In setting up a probabilistic model for this so-called Brownian motion, 
we simply replace Einstein’s estimate of what happens to a large 
number of particles by a probability for what happens to one particle. 
We are then to require that 


1 
Pr[particle started at u is in E at time t] = Í rj” e7lu-yl?l2tdy, 
E Ti 
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where E is a Borel set in three-dimensional (Euclidean) space R? and 
ju — y| denotes the Euclidean distance from u to y. If we use the 
notation Pr,[x, € Æ] for the left side and the notation p‘(u, E) for the 
right side, we have 

Pr,[z,¢€ E] = p'(u, E). 


By Theorem 1-41, p‘(u, F) is a measure depending on t and u and 
defined on the smallest Borel field containing the open sets of R’. 
Therefore, we may write 


Pr,[z,¢ E] = Í p'(u, dy). 


The physical theory also makes us require that if t < tg <--> < tm 
then {£i , Zi, — %4,,--+,%, — %,,_,} should behave like a set of 
independent random variables with x,,, — x, having the same distri- 
bution as x, That is, we require that 


Prlz, CB, A+++ A (Ei — Zi) EEn] 
= Pr,[x,, € E] -Prale — £a, E E,] 
= Pr [ts Ee #,)---- Prive, -tpo EF E,]. 
This identity implies that we must have 


Prifz,ceH nzr, eF A---A2,EG A x,€ A] 


= | peu dw) | pr-eqw, de) f- f pty, dey 


We note that with these various requirements we have given more than 
one definition for Pr ,[x, € Æ] and that we must check, for example, 
that 

Pr,[z, € R? A x,¢ E] = Pr [z,e E] 
and 

Pr,[z,¢ F A x, e R?] = Pr,[a, e F]. 


Such identities can be verified by direct calculation. It should not be 
too surprising that such consistency conditions arise since they arose 
with denumerable stochastic processes earlier: In the proof of the 
Kolmogorov Extension Theorem in Chapter 2 we required that the 
measures on cylinder sets all be consistent. 


Now for any denumerable Markov chain P we have 
(1) 0 < Py = Priz, = j], 


j 


(3a) Prizi =j At = kA- Era = TAX, = 8] 
= Pi Pi +++ Pis 
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The last equality (3a) implies and is implied by 
(3b) Pr{z4,eH A tQEF N-n £r- EGA x, € A] 
= > Pi; > Pees > Pr 
jeE keF seH 
It is easy to prove both the Markov property and the Markov chain 
property from (3a), and hence (1), (2), and (3b) give an equivalent 


definition of Markov chain. The analogous statements for Brownian 
motion are 


a’) 0< Í p'(u, dy) = Pry{x, € E], 
E 
2) Pre R°] = [| p(w, dy) = 1, 
R? 
and 


(3’) Pr,[z,eH Ax,€F A-:--A x, € H] 


= Í ptu dry f pew, dey fof pry, da 


As expected, these statements imply that the position of a particle at 
time ¢ + s depends only on s and the particle’s position at time t, and 
not on the value of t or what happened to the particle before time t. 
(This assertion can be formulated precisely in terms of means of 
functions given a Borel field, which are a technical generalization of 
conditional means of functions given a partition.) 

As with denumerable Markov chains we need not require that a 
Brownian motion particle be started deterministically at a state w. 
If we start the particle according to probabilities assigned by a measure 
p on R’, then we have 


1 
= ———— e~ |u-yl? /2¢ 
Pr,[z, € E] = I. Í, Gah? e yl" 2tdydu(u) 


= f., f, vie dydw). 


A similar expression holds for the probability of being in a finite 
sequence of sets at specified times. 

In Section 3 we shall need a formal definition of Brownian motion, 
and we use the formula for Pr,[x,€ E] to motivate it. We define a 
transformation Pt of the measures u on R? with a(R?) = 1 into 
themselves by 


l 2 
(P'E) = Pr,[x,¢ £E] = i. ifs ane | dy) dy 
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Later we shall replace the expression in brackets by f(y, t) for simplicity 
of notation. That uP* is a measure follows from Theorem 1-41, and 
that (uP*')(R%) = 1 follows from the identity 


war) = [| rae ee dy] du 


1 2 
z i lar i oe Pau |e a 


after a change of variable. 


Definition 7-1: The totality of theorems about the operators Pt, the 
measures u on R? with u( R°?) = 1, and quantities definable in terms of 
them and properties of R? is called Brownian motion theory. 

We immediately extend Pt by linearity to be defined on all finite 
measures on R°? and all differences of two finite measures. 


2. Potential theory 


Classical potential theory begins as a study of Coulomb’s law of 
attraction of electrical charges in physics. This law states that every 
two charges in the universe attract (or repel) each other with a force 
whose direction is the line connecting them and whose magnitude is 
proportional to the magnitude of each of them and inversely propor- 
tional to the square of the distance between them. That is, 


F= comp 


where eo is a constant depending on the units. As an aid in the study, 
one introduces the notion of potential: The potential at a point x due 
to a charge q is the work (or energy) required to bring a unit charge 
from infinity to the point x. It can be shown that this potential is 
independent of the path along which the charge is brought to the point 
x and that its value is 

1 q 

Qn |x — zo) 
where x, is the position of the charge and where the constant 1/27 has 
been fixed after a certain choice of units. 

More generally one defines a charge distribution to be the difference 
of any two finite measures defined on the Borel sets of R°, that is, the 
smallest Borel field in R? containing all open sets. The potential at x 
due to the charge distribution is again the work required to bring’a 
unit charge from infinity to the point x. Since force (and hence work) 


170 Introduction to potential theory 


are additive, the potential due to a charge distribution consisting of 
charges q;,..., qn at points z,,..., 2, is 


1 4 qi 
PE] 


Passing to the limit in an appropriate sense, we would expect the 
potential due to an arbitrary charge distribution u to be 


i duly) 
2r Jre |x — y| 


After checking that such an expression is always well defined, we shall 
define a potential to be any function of this form. 


Lemma 7-2: If u is a charge distribution, then 


1 1 
g(x) = as Í, [zx — yl du(y) 


is finite a.e. with respect to Lebesgue measure. 


Proor: It suffices to prove the lemma for the case where p is a 
measure, since the general case follows by taking differences. Let K, 
denote the closed ball about the origin of radius n, and form 


f glxjdx = 7 ‘a ie ol elie 


By Fubini’s Theorem we may interchange the order of integration to 
Py m s 


get 
1 Í 1 
g(xjdx = — -—— derj|d 7 
k ge) 2a ie Lh. je — y| í efant) 


The inside integral on the right is bounded by its value at the origin, 
which is some finite number c. Thus the right side does not exceed 


1 
z cH(R®) < œ, 
2r 


and g must be finite a.e. on K,. Since the countable union of the sets 
K,, is R?, we conclude that g is finite a.e. 


Definition 7-3: The function 


l l 
— —— d 
on ih, |x = yl u(y) 


for » a charge distribution will be called the potential of u. 
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The class of theorems relating charges and potentials and quantities 
definable in terms of them and properties of R? is called classical 
potential theory. The operator transforming a charge into its potential 
is called the potential operator. The kernel of the potential operator 
is called the Green’s function. 


As we have defined it, potential theory contains the subject known in 
physics as electrostatics. Our definition includes the notions of 
distance, charge, and potential, and all the quantities commonly arising 
in electrostatics are definable in terms of these three notions. As an 
illustration, Table 7-1 shows how some of the quantities arising in 
electrostatics are related dimensionally to distance, charge, and 
potential. The table uses the notation 


distance = x distance = x 
time =t and charge =q 
mass =m potential = V 
charge =q 


TABLE 7-1. DIMENSIONS OF ELECTROSTATIC CONCEPTS 


Concept Dimensions Potential-Theoretic Dimensions 
Capacity gt? /ma? q/V 

Charge q q 

Energy ma? |t? Vq 

Field mzjt?q Vx 

Force majt? Vq/x 

Potential mac? /t?q 7 


We give four examples to illustrate how concepts may be defined 
explicitly in terms of distance, charge, and potential. 


(1) We can reasonably ask what the total amount of work required 
to assemble a charge distribution is if only an “infinitesimal” amount 
of charge is brought into position at one time. The way to compute 
this amount of work is to integrate the potential function against the 
charge distribution, provided the integral exists. Thus we define the 
energy of a charge distribution to be the integral of its potential with 
respect to the charge, provided the integral exists. 

(2) The total charge of a charge distribution u is (R3). 

(3) Ifa total amount of charge q is put on a piece of conducting metal 
in R°, the charge will redistribute itself in such a way that the potential 
is a constant on the set where the metal is. The situation where the 
potential is constant on the metal is the one which minimizes energy 
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among all charges u with total charge q and with » vanishing away from 
the set where the metal is, and this situation is referred to as equilibrium. 
We define an equilibrium potential for a closed set Æ to be a potential 
which is 1 on Æ and which comes from a charge distribution which has 
all its charge on E. An equilibrium set is a set which has an equilibrium 
potential. 

(4) The capacity of a conductor in R? is defined as the total amount 
of charge needed to produce a unit potential on the set where the 
conductor is. We thus define the capacity of any equilibrium set to be 
the total charge of a charge distribution producing an equilibrium 
potential. 


To indicate the directions in which classical potential theory leads, 
we shall state without proof some of the theorems in the subject. The 
support of a charge is defined to be the complement of the union of all 
open sets U with the property that the charge vanishes on U and every 
measurable subset of U. 


(1) Uniqueness of charge: A potential uniquely determines its charge. 

(2) Determination of potential: A potential is completely determined 
by its values on the support of its charge. 

(3) Uniqueness of equilibrium potential: A set E has at most one 
equilibrium potential. (This assertion is a corollary of (2).) 

(4) Characterization of equilibrium potential: The equilibrium potential 
for an equilibrium set Ẹ is equivalent to the pointwise infimum of all 
potentials which have non-negative charges and which dominate the 
constant function 1 on Æ. 

(5) Principle of domination: Let h and g be potentials arising from 
non-negative charges ñ and u, respectively. If h dominates g on the 
support of u, then h dominates g everywhere and f(R°) > u(R3). 

(6) Principle of balayage: If g is a potential with a non-negative 
charge and if Æ is a closed set in R’, then there is a unique potential g 
with a non-negative charge with support in E such that g = g on Æ. 
The potential g (called the balayage potential) satisfies 9 < g every- 
where, and its total charge does not exceed the total charge of g. 
The balayage potential is equivalent to the pointwise infimum of all 
potentials which have non-negative charges and which dominate g on 
E. It is equivalent to the supremum of all potentials which are 
dominated by g on E and whose charges have support in Æ. 

(7) Principle of lower envelope: The pointwise infimum of potentials 
with non-negative charges is equivalent to a potential with a non- 
negative charge. 

(8) Non-negative potentials: The charge distribution of a non-negative 
potential has non-negative total charge. 
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(9) Energy of equilibrium potential: If E is an equilibrium set of finite 
energy, then the equilibrium potential minimizes energy among all 
potentials whose charges have support in Æ and whose total charge is 
equal to the capacity of E. 


3. Equivalence of Brownian motion and potential theory 


Kakutani [1944] observed that several of the basic quantities of 
potential theory, like equilibrium potential, had simple probabilistic 
interpretations in terms of Brownian motion. If Æ is an equilibrium 
set, the value of the equilibrium potential at x is the probability that a 
Brownian motion process started at x ever hits the set Æ. Doob and 
Hunt extended Kakutani’s work, and it gradually became clear that in 
a certain sense Brownian motion and potential theory were really the 
same subject. 

To say that they are exactly the same would be to say that every 
theorem about Brownian motion should interest a person studying 
potential theory, and conversely. Although it is doubtful that this 
situation is the case at present, it is certainly true that modern develop- 
ments in the two subjects are moving more and more in this direction. 

We shall now show that there is a natural way in terms of Brownian 
motion of obtaining the operator mapping charges into potentials, and 
that, conversely, from the potential operator it is possible to recover the 
family {P*} of transition operators for Brownian motion. These facts 
make it clear that in a technical sense the two theories are identical. 

The proof in the first direction is easy and is completed by Proposition 
7-4. We recall from the definition of „Pt that 


(uP'\(B) = Í flo, tìde 


for a certain function f(x, t). 


Proposition 7-4: Every theorem about potentials can be formulated 
as a theorem about Brownian motion. Specifically, if u is a charge, 
then the potential g of u satisfies 


g(x) = lim ” $a, t)dt, 
o 


T> œ 


where 
(uP')(E) = Í fle, tde. 


Proor: We may assume that u > 0 without loss of generality. Then 
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By Fubini’s Theorem we may interchange the order of integration. 
The above expression is 


= fe [fl anon eae aac 


We make the change of variable on ¢ which sends |x — y|?/¢ into w?. 
The expression becomes 


|x -yj/VT 


By the Monotone Convergence Theorem, 


lim “fl x, t)dt = ts ii 5 |x — ae 


ioe 
7 fo |æ mon’ a“ [ze | VE edu] 
sabrenn 
= g(x). 


Proposition 7-4 is a precise statement of the connection between 
Brownian motion and potential theory in one direction. We see that 
formally the potential operator is 

T 

lim Ptdt. 

T>œ Jo 
Thus to complete the proof of the equivalence of the two theories, what 
we need to do essentially is recover a sequence from its limit. Of 
course, we cannot do so unless we know some other properties of the 
sequence, and it is the isolation of these properties that makes this half 
of the equivalence difficult. We shall not go into the details here, but 
we can indicate the general approach to the problem. 

Let Co denote the set of continuous real-valued functions f on R° 
which vanish at infinity; that is, which are such that for any e > 0 there 
is a ball of finite radius in R? outside of which f is everywhere less than e 
in absolute value. We define Qt on C, by 


FY) = Í, aa ge -IRF lhe 


The K facts can be checked: 
) Iff is in Co, then so is Qf. 
(2) supy |(@)(y)| < sup, |f(y) 
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(3) girs = QQ. 
(4) (Qf) (y) converges uniformly to f(y) as t decreases to 0. 
(5) (Q‘f)(y) converges uniformly to 0 as ¢ increases to 00. 


Now any set function in the class M of differences of finite measures is 
completely determined by its effect on all the functions in Cy, and a 
direct calculation shows that 


[fur = f, Opiy 


for every pe M and feC,. It follows that Qt and this equation 
completely determine Pt. Therefore, it is enough to recover Q‘ from 
the potential operator in order to prove our result. 

For every f € Cy such that (1/t)[((Q‘f)(y) — f(y)] converges uniformly 
as £ decreases to 0, we define 


Af = lim [Qf — f}. 
t10 


It turns out that if A is known on its entire domain of definition, then 
Qt is completely determined by the definition of A and by the first four 
properties of Qt listed above. Thus if A could be defined within 
potential theory, then so could each of the operators Qt: They are the 
unique family of operators such that the definition of A and properties 
(1), (2), (3), and (4) hold. The actual proof of this existence and 
uniqueness consists in writing down a concrete formula for Q in terms 
of A and ¢; we reproduce it in order to show that nothing appears in the 
formula except A, t, and the identity operator I: 


z Z 1 
Qf = lim > gA - A) — A. 


For every f in Cy such that fo (Q‘f)(y)dt converges uniformly as 
T —> œ, we define 


k T 
Gf = lim f (Qf at. 


It can be shown readily from the five properties of Q' that G and — A 
are inverse operators on their respective domains. Thus each uniquely 
determines the other. Finally (and here is where some work is required) 
G looks sufficiently like the potential operator when its definition is 
compared with the formulas of Proposition 7-4 that the potential 
operator determines G. Thus the potential operator determines G, 
G determines A, A determines Qt, and @t determines P'. Hence every 
theorem of Brownian motion theory is a theorem of potential theory. 
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4. Brownian motion and potential theory in n dimensions 


In the mathematical formulation of Brownian motion and potential 
theory there is no need to restrict the underlying space to be three- 
dimensional. We can define an n-dimensional Brownian motion 
operator P! by 


WPD = | iyn [eis et dataz. 


The potential operator differs in appearance from dimension to dimen- 
sion more than the Brownian motion operator does, but its kernel is 
still a constant multiple of the integral of |x|7"-}. The potential 
g(x) of p, the difference of two finite measures, is defined by 


-Í |x — y|du(y) in dimension 1 

R 

g(x) = 2 | log |x — y|du(y) in dimension 2 
RÊ? 


1 . . . 
"n [a [z — y? du(y) in dimension n > 3, 


where 
Cn = 407" 2T(4(n — 2)). 


In dimension n = 3, g(x) is necessarily finite a.e., but in dimensions 1 
and 2 we shall need to assume that g is finite a.e. 

The fact that the kernel 1/|z|"~? tends to zero at infinity in dimension 
n > 3 but the kernels |x — y| and log |z — y| do not tend to zero in 
dimensions 1 and 2 gives us a clue that the potential theory or dimen- 
sions 1 and 2 will differ sharply from that in higher dimensions. We 
shall discuss the reason for this difference shortly. 

In dimension n > 3, Brownian motion theory and potential theory 


are again equivalent. The formula 


= i j l -]x- yl? /2t 
ac) = in, aape fy Mala 
generalized from Proposition 7-4 is still valid, and the discussion of 
Section 3 goes over with little change to establish the equivalence. 
But in dimensions | and 2, it does not. The above formula is not 
true for these dimensions, and the argument in Section 3 fails after the 
operator G is introduced. The reason for this failure is the following. 
We recall that in dimension n > 3 the potential operator is formally 
limy -o ls P'dt. In dimensions greater than or equal to 3, this 
quantity is finite, whereas in dimensions l and 2 it is infinite. Now 


7-5 Brownian motion and potential theory in n dimensions 177 


limy o fo P'dt plays much the same role for Brownian motion that 
z-o P” plays for denumerable Markov chains. It is finite if the 
process is transient, and infinite if the process is recurrent. In fact, the 
distinction between transience and recurrence is what is relevant for 
Brownian motion here: In dimensions 3 and greater a Brownian motion 
particle after leaving the unit ball of R” returns to it with probability 
less than one, whereas in dimensions 1 and 2 it returns with probability 
equal to one. 

The potential operator in dimensions 1 and 2 arises in a different way. 
Specifically, the formula generalized from Proposition 7-4 is not valid in 
general, but it is valid if » has total charge zero and if a mild additional 
condition is satisfied. The exact formulation of this result will interest 
us later, and we give it as the next proposition. 


Proposition 7-5: In R” for n = 1 or 2, let u = u* — pw be the 
difference of two finite measures, and suppose that 


Í |x — yļdu*(y) <œ and i |æ — yldu-(y) < œ aeifn=1 
R? R! 
or 


J, How le — yllda*ty) < œ and f, flog je — yl|ldun(y) < o 
a.e. ifn = 2. 
If a(R”) # 0, then 


T 1 
ma | aara |. e-lz-vl?2du(y)dt = +00 or —cO a.e. 


If a(R”) = 0, then 
` g 1 —|x- yl? /2¢ 
g(x) = in > Gabe f leu Ptduly)dt 


exists, is finite a.e., and satisfies 


ll 
— 


-f le- sld) ifn 
g(x) = 
2 f log |x — yldu(y) if n 
R? 


Il 
5 


Proor: We prove the result for n = 1; the ideas in the proof for 
n = 2 are similar. The same calculation as at the beginning of the 


178 Introduction to potential theory 


proof of Proposition 7-4 shows that for n = 1 


T 1 œ 
e7 lz-v?/2tdu(y)dt 
Í V 2t f, oy) 


= F jz — yl | f Ja eaula 


ix- yl/ yT 


If we use integration by parts on the terms in the brackets, differentiat- 
ing the exponential and integrating u~?, we find that the right side is 


œ 2 * 
=f e -alg ee a Í taal any 
“VE |x -ylIVT 


œ 


-F |e — „| | -= eet tty 


lx- yl/ı¥T 


+ z f V Te-lz-1%2Tdu(y). 


We let T tend to œ and consider each term separately. In the first 
term the expression in brackets increases to 1. If we write the integral 
as the difference of one with respect to u* and one with respect to u` 
and use the fact that 


Í |z — y|du*(y) < œ and l lz — yldu-(y) < œ a.e., 


we see by the Dominated Convergence Theorem that the first term tends 
a.e. to 


-F |e — y|du(y). 


Next we consider the second term. Suppose first that u( R1) ¥ 0. 
The second term tends, as T — œ, to 


lim VT lim =| e7 lz-132Tdu(y), 


T> œ T> œ V/ In 


and the integral and the second limit may be interchanged by dominated 
convergence to become 


Ne) 


= +0, 


ll 
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To complete the proof we shall show that if u(R*) = 0, then 
F VTe-!-vl7/2Tdu(y) 


tends to zero a.e. as T’—> œ. Since (R!) = 0, we have 


foe} 


[i vreta = f? VT — ndalo), 
We shall prove that the right side even tends to zero when p is replaced 
by wt orp” Let us do so for u*. First we show that 
|W Te“ 27/27 — 1)| < kļz| 


for a fixed constant k and for all T. By differential calculus methods, 
we find that 
e-~ |21? 12T =] 


H 


assumes its maximum value as a function of |z| when |z| satisfies 


EP = ¢l2l?/2T _ 1, 


The unique positive solution occurs for 2 < |z|?/T < 3. Let b = 
|z|2/T with 2 < b < 3 be the point at which the solution occurs. Then 


— oe l2l2/2T _ = 


tn | 1 — e-22 


VIT | vb 


and 
|V T(e7l21712T — 1)| < kel. 
Put |z| = |x — y|. Since 


Í k|x — yldut(y) < o a.e., 
we have by dominated convergence 


lim ° VT (e-!=-¥l?/2T — 1)du*(y) 


T>œ J-—o 


7 E lim [VT(e~!#-¥l?2T — 1)]du* (y) 


-œo T> œ 


for almost every x. The integrand on the right side is identically zero. 


The hypotheses of Propositions 7-4 and 7-5 are worth reviewing and 
comparing. In the transient case, Proposition 7-4, we started with 
any element u of M and we were able to conclude both that the potential 


180 Introduction to potential theory 


operator was defined on p and that its value was the Brownian motion 
limit. In the recurrent case, Proposition 7-5, we started with any 
element u of M and we had to assume that the potential operator was 
defined on p; we then concluded that the potential of u was equal to 
the Brownian motion limit if and only if » had total charge zero. We 
shall see that the same thing happens in potential theory defined for 
denumerable Markov chains. 


5. Potential theory for denumerable Markov chains 


We turn our attention now to those properties of Brownian motion 
which relate it to potential theory. In this section we shall answer 
the following questions: 


(1) How can the connection between Brownian motion and classical 
potential theory be used to define a potential theory for denumerable 
Markov chains? 

(2) How does potential theory differ in the transient and recurrent 
cases, and what form does the potential operator take? 

(3) What is the nature of the inverse operator that transforms 
potentials into charges? 

(4) What other Markov chain concepts play a role in potential 
theory? 


For definiteness, let P be a denumerable Markov chain which either is 
recurrent or is transient with no absorbing states, and let œ be a positive 
finite-valued P-superregular measure. 

Before defining a potential theory for denumerable Markov chains, 
we should discuss some properties of the operators Pt and @t. The 
operators Pt and Qt act respectively on differences of finite measures 
and on functions in Cy according to the equations 


UPAD = | [f orem ee dy due) 


ONW) = f gaya ede, 


and they are related by the identity 
| OPs 
R 


The linearity properties 
(u + v)P! = uP! + vP 


and 


[ aup». 


and 
(cu) P! = c(uP'), 


7-5 Potential theory for denumerable Markov chains 181 


together with a certain property of continuity, imply that the action of 
Pt on differences of finite measures is analogous to the action of a matrix 
on row vectors. Similarly, the corresponding properties for Qt imply 
that the action of Qt on functions in C% is like that of a matrix on column 
vectors. But the real insight into Pt and Qt comes in realizing that 
these matrices are identical. To be more specific, we must reformulate 
the assertion for a countable space. 

Let P* be a continuous linear operator on row vectors u which have a 
finite sum, and let Q* be a continuous linear operator on column vectors 
f whose components tend to 0. Suppose that P* and Q* are related by 


the identity 
HQS) = (uP*) f 


for all » and f of the type specified, where - stands for vector multiplica- 
tion. Let 5” and d® be the row and column vectors having ith 
component equal to 1 and all other components equal to 0. The vector 
5 is in the domain of P*, and d® is in the domain of Q*. If we define 
a square matrix {P#\ by 


Př = (8P*).d® = (82 P*);, 
then 
(uP*); = > plô P*); = > me Ph. 
k k 


Hence the operator P* may be represented by the matrix {P*}. But 
by the identity relating P* and Q*, 


P% = 8®.(Q*d®). 


Hence by a similar argument, the operator Q* is also representable by 
the same matrix {P#}. 

Thus the denumerable Markov chain analog of the pair of operators 
P! and Qt can be expected to be a single matrix depending on t. For 
t = 1, this matrix can be taken to be the transition matrix P of the 
Markov chain. Then the relations Pt+s = PtP: and Qt’ = Q'Q° for 
integers ¢ and s imply that the analog of Pt and Qt for any other 
integer value of t is a power of the matrix P. 

Lebesgue measure has a special property with respect to Brownian 
motion which is summed up in the equation 


1 
= op lx —y |? /2t 
Í = le p (2rt)™? m ” ay| a 


If we call Lebesgue measure o and use notation that earlier was re- 
served for finite measures, this equation becomes 


o =oFf', 
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That is, o is regular for P'. Thus the analog of o for the Markov chain 
P should be a P-regular measure. But if P is transient, then P need 
not have a regular measure. We therefore relax our requirement and 
ask only that the analog of o be a P-superregular measure. We can 
then decree that the specified measure « is to be the analog of o. 

The problem of defining potentials for Markov chains becomes a 
problem of translating notions about Brownian motion into notions 
about Markov chains. Following Propositions 7-4 and 7-5, we recall 
that a potential g(x) is obtained from a charge u in this way: If we 
abbreviate the equation 


Pye) = | [| aoe dy] date) 


(uP\(B) = i fiw, de, 
E 


as 


then 


g(x) = lim f(x, t)dt. 


T> ow Jo 


Translating the relation for (uPt)(E) into notions about Markov chains, 
we write 


> EP = > fra: 


ieE ieE 


If E is the one-point set {i}, we find that 


fp 


1 
o; (uP i 
or 


f = dual (uP"). 
The equation defining g(x) translates into 


lim pL + fe Hisis + f™] 


ll 


g 


or 


dual lim [uJ + P +: + P”) 


no 


<Q 
Ii 


Classically, potentials are left as point functions and are never trans- 
formed into set functions because such a transformation is frequently 
impossible. In Markov chain potential theory, however, every column 
vector can be transformed into a row vector by the duality mapping. 
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If we take the dual of both sides of the boxed equation, we get 
dualg = lim [uJ + P+---+ P”) 


For simplicity in notation we shall adopt the convention that dual g, 
and not g, is the potential of u. We can at last formulate a definition. 


Definition 7-6: Any row vector u with u1 well defined and finite for 
which the limit 
v= lim [ul + P+---+ PY] 


exists and is finite-valued is called a left charge with potential measure v. 


The condition that u1 be well defined and finite is the analog of the 
condition that a charge in R” be the difference of two finite measures. 
The boxed equation for g yields an alternate possibility, namely 


g = lim (J + Ê +---+ P)(dual p)] 


with «(dual u) = p1 finite. If we had gone through the same argument 
for the process P, we would have obtained the same equation with the 
carets removed. We therefore complement Definition 7-6 as follows. 


Definition 7-7: Any column vector f with af well defined and finite 
for which the limit 


g = lim [I+ P +.---+ Pf] 
n> o 
exists and is finite-valued is called a right charge with potential functiong. 


From our knowledge of what happens with Brownian motion, we 
should expect that the Markov chain potential operators will arise in 
different ways in the transient and recurrent cases. Consequently, we 
shall treat the different kinds of processes separately, handling the 
transient case in Chapter 8 and the recurrent case in Chapter 9. 

In the transient case of Brownian motion the operator was formally 


T œ 
lim Í Ptdt = Í P'dt, 
0 


0 


and it is no surprise that for Markov chains it is the matrix N = 
>P_o P” which turns out to be the potential operator both for left 
charges and right charges (see Theorem 8-3). Once we have the 
potential operator, it will not be difficult to develop a full theory in 
analogy with classical potential theory. 

In the recurrent case, however, the problem of finding the potential 
operator is not so easy. The information that we will find, just as in 
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Proposition 7-5, is that for any charge of total charge zero on which the 
potential operator is defined the potential operator should agree with 
the operator which is formally 


T 
lim Í P'dt. 
0 


It will turn out that there are many possible potential operators for left 
charges and many others for right charges. Of these the matrices 
—C and —G, respectively, will be representative (see Definition 9-24). 
But if we ask that the same matrix work for both left and right charges, 
then we shall! see that there is a matrix K such that all such potential 
operators are of the form — K + cla, where c is a constant (see Theorem 
9-84). With —K as our operator, we have some hope of imitating 
classical potential theory if we redefine charge and potential in terms of 
K: The column vector f is a charge, for instance, with potential g if 


g= —Kf. 
From this new definition of charges and potentials, we shall be able, just 
as in the transient case, to prove theorems which are analogs of some of 
the main results of classical potential theory. 
In discussing the relation of Brownian motion to potential theory in 
Section 3, we mentioned that the operator inverse to — A, where 


Af = lim > QY - P), 
t10 


was of the same form as the potential operator. It is thus quite 
believable that — A should be essentially the inverse operator that 
transforms potentials into charges. Now the definition of A involves a 
derivative, and when concepts in Brownian motion are translated into 
concepts in Markov chains, derivatives transform into differences. 
Therefore, the proper analog of Af for Markov chains is Pf — f = 
(P — I)f. That is, J — P plays the role of —A. In Theorems 8-4 
and 9-15 we shall see that J — P is indeed the operator that transforms 
potentials in the sense of Definitions 7-6 and 7-7 into charges. 

With Brownian motion the operator A is a constant multiple of the 
Laplacian operator 4 for smooth enough functions, where 


82 82 
p= (aa tet a) 


If a function f satisfies the equation 4f = 0 in a neighborhood of a 
point x, then f is said to be harmonic in a neighborhood of x. The 
analog in the case of denumerable Markov chains is that if a column 
vector f satisfies (P — I)f = 0 at the point i, then f is regular at i. 
Thus we can expect that regular functions will have some of the same 
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behavior for Markov chains that harmonic functions have classically. 
As an example, a function harmonic on a connected open set in R” 
cannot assume its maximum value inside that open set unless the 
function is constant. We shall see in Corollary 8-44 that an analogous 
result holds for Markov chains. 

A twice continuously differentiable function f is said to be super- 
harmonic if 4f < 0. The analog of this property is the condition that 
(P — I)f < Oorf = Pf. Thus the analog of a superharmonic function 
is a superregular function. 


TABLE 7-2. Markov CHAIN ANALOGS OF POTENTIAL THEORY 


CONCEPTS 
Classical Notion Markov Chain Analog 
R" State space S 
P! and Qt P” 
Lebesgue measure a 
Potential lim (I + P +... + P”) 
or lim (Z + P+.---+ Pf 
Total charge ul or af 
Potential operator Transient: N 
Recurrent: —K 
Inverse operator I-P 
Harmonic function Regular function 
Superharmonic function Superregular function 
Connected set Communicating set 


6. Brownian motion as a limit of the symmetric random walk 


The symmetric random walk in n dimensions was defined in Chapter 4 
as a Markov chain obtained from sums of independent random variables 
on the integer lattice in R” with the probability of going from any state 
to any of the 2n neighboring states equal to 1/(2n). In potential 
theory for Markov chains this process assumes the role of the “classical 
case,” exhibiting in its potential theory much of the special behavior of 
the theory in Section 2. For instance, the matrix of the potential 
operator for this process has the same asymptotic behavior at infinity 
as the potential kernel has classically: log |x| in two dimensions, 
1/|z| in three dimensions, and so on. 

The reason for this coincidence is that Brownian motion is in a 
precise sense the limit of the symmetric random walk. Specifically if 
the random walk is considered first on the integer lattice, then on the 
half-integer lattice, then on the quarter-integer lattice, and so on, then 
the probabilities in the kth process of being in a fixed ball in R” after 
time 4*t converge to the probability in Brownian motion of being in 
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that ball after time ¢. We shall prove this result only in the one- 
dimensional case, and we make use of the Central Limit Theorem 
(Theorem 1-68). 

Consider for fixed k the random walk on the line having as states the 
points of the form j2~*, for j any integer, and having transition prob- 
abilities 4 from any state to each of its two neighbors. This process is 
the symmetric random walk with a change in scale. Let x be the 
nth outcome function, and let z = 0. As in Section 1, we let x, 
denote the position in Brownian motion. 


Proposition 7-8: Brownian motion in one dimension is a limit of the 
symmetric random walk in this sense: If t is a diadic rational and if « 
and 8 are real numbers, then 


lim Pro[z{, € (æ, B)] = Prole, € (a, f)]. 


k= œ 


Proor: The random variables 2x*{, — z® are independent and 
identically distributed and have mean 0 and variance 


eee SE ae) (ae 
” = 29 Nae 5 (-x) ~ ae 


Let m = 4*t be an integer. Since 
m 
kb) = k 
wm = D (wh? aga), 
n=1 


x has mean 0 and variance m/4 = t. Hence, by the Central Limit 
Theorem, 


; (k) B B ä 
lim Pr Fe << F) = o£) - (5) 
k> OLVE vi vi vt vit 

or 

lim P go = o(£) = o(+). 
~~ Tole < x% < f] RT a 
On the other hand, by definition, 
B 
Prof, € («, B)] = Í x oH dy 
a V 2rt 
BIE 1l 204 
= — e`" ldy 
ee Vr 


8-a) 


am Pro[xae, € («, B)] = Prolx, € («, 8)). 


Therefore, 
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7. Symmetric random walk in n dimensions 


As was mentioned in Section 6, the n-dimensional symmetric random 
walk and n-dimensional Brownian motion share a number of properties 
because the second process is the limit of the first. Of these we shall 
prove for the random walk just two—that the process is recurrent in 
dimensions 1 and 2 and transient in dimension n > 3 and that in the 
transient case the columns of the N-matrix tend to zero. The second 
result is in analogy with the behavior of the potential kernel 1/|z|"~? 
in Brownian motion. 

For the first problem we note immediately that all states communi- 
cate in the random walks of all dimensions; hence each of them is 
either transient or recurrent. In one dimension the state space is the 
integers, and 

Piisi z Pii-1 = }. 


Since the mean step in zero, the process is recurrent by Proposition 
5-22. A more direct proof of the recurrence proceeds as follows. It is 
impossible to get from state 2 to state 0 without going through 1. 
This fact, together with the translation invariance of the hitting 
probabilities, implies that 


Ho = Ay,H yo = (H1)?. 
But _ 
Hoo = $49 + $4-1.0 = $410 + Ho = Hio 
since H,,; = Hio. Therefore, the identity 
Ho = Hoo + 4Ha = & + (H10) 


implies that H,, = 1 and hence Ho, = 1. Consequently, the process 
is recurrent. Still a third proof can be based on a calculation of Nop. 


In fact, we have 
_o [2N 
PEP = 2 a ) 
because in order for the process to return in 2n steps to 0, it must make 


n steps to the right and n to the left; each such possibility has prob- 
ability 2-2". By Lemma 1-59, 


(7) ~ C22" ee, 
n Vn 
Hence the tail of the series Noo = > PY” dominates a constant multiple 


of the tail of the series 5 1//n. Therefore, Noo is infinite and the 
process is recurrent. 
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In the two-dimensional case there are two simultaneous processes 
going on in perpendicular directions, and for the process to return to 
the origin, both of them must return to their zero positions at the same 
time. Letting k be the number of steps to the right and n — k the 
number of steps up in 2n steps, we have 


z 2n 
(2n) — 4-2n š 
Paneg Sue enan 


k=0 
If we multiply numerator and denominator of the multinomial co- 


s 8 2n\ (n\? 
efficient by (n!)?, we see that it equals (7 ) (;) . Thus 


2n\ & (n\? 
(2n) — 4-—2n . 
rap = ao) 3 (i) 


Dt) = (3) 


The identity 


shows that 
_ 5, (2n\2 
PRP 4 af n) š 
Since 
(7) ~ c22” ER 
n Vn 
we have 


1 
PEY w 2. 
00 a 


Thus the series Noo = > Pt?” dominates a multiple of > l/n, Noo 
must be infinite, and the process is recurrent. 

An alternate proof that the process is recurrent in two dimensions 
proceeds as follows. If we introduce the new coordinates 


u=x+y 
v =x -— y, 
then the two-dimensional symmetric random walk described in the 
coordinates (u,v) executes two one-dimensional symmetric random 
walks independently of each other. Hence P(2%), 5) is the probability 
that u = 0 and v = 0 after 2n steps, which is (P2”)? ~ c?/n, 
In three dimensions we calculate Noo. We have 


2n 
Pen — 6-2" we 
i A 


2n n 2 
—2n -2n 7 
: (ee eer, 
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n 


a E E e ,) are dominated by the central term 


The coefficients ( 


n 
n/3, n/3, n/3 , 
where the gamma function may be used for (n/3)! if 3 does not divide n. 
This fact and the observation that the coefficients (, ý : ) 
jkn-j—k 
sum to 3” implies that 


2n n 
(2n) Q-2n -n ; 
Pona (a) (ajs, nį3, n3) 


Summing on n and using the approximations of Section 1-6a, we see 
that the series Noo = > PRP is dominated by a multiple of > 1/n?/?. 
Therefore Noo is finite, and the process is transient. 

If any higher-dimensional random walk were recurrent, then the 
process projected to a three-dimensional set would be recurrent, and 
the latter process watched only when it changes state would also be 
recurrent. But this last process is exactly the three-dimensional 
symmetric walk. We conclude therefore that the random walk in all 
dimensions greater than three is transient, and we have completed the 
proof of the following proposition. 


Proposition 7-9: The symmetric random walk is recurrent in dimen- 
sions one and two and is transient in all dimensions greater than or 
equal to three. 


In the transient case of dimension n > 3, the jth entry in the Oth 
column of the N-matrix is of the order of a constant times |j|~"~”. 
We conclude this chapter by proving the weaker result that the entries 
of that column tend to zero, but our proof will be for a more general 
situation. 


Proposition 7-10: Let P be a Markov chain with an infinite state 
space such that 


(1) P is transient. 

(2) P = PT, 

(3) P has only finitely many non-zero entries in each row. 
Then 


j> œ 
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If, in addition, P represents sums of independent random variables, 
then 

lim No; = 0. 

j> œ 
In particular, the conclusions apply to the symmetric random walk in 
any dimension n > 3. 


Proor: We note that hypothesis (3) is equivalent to 


(3’) For any state 0 and for any given m, there exist only finitely 
many states j for which F{ # 0 for some k < m. 


If the conclusion about Ho; is false, then for some e > 0 and for 
infinitely many j we have Hy, > e. By (1), N is finite-valued. 
Therefore, 
lim P*N = lim [N — (I +.---+ P*-1)] = 0 
k> œ k> œ 
and 
lim P*H = 0 
k> œ 
since H < H < N. Choose m large enough so that (P"H)o, < e. 
Since there are infinitely many j such that Ho; > e, we can find by (3’) 
such a j with F = 0 for all k < m. Then 


Pr,[hit 0 after time m] > Pr,[ever hit j and return to 0] 
= HoHo 
= HoHo; by (2) 


> e. 


But 
Pro[hit 0 after time m] = (PH) < €, 


a contradiction. Therefore Ho; —> 0. 
Finally if P represents sums of independent random variables, 


No; = HoN; = HoiNoo- 
Hence No; — 0. (Note that we really need only that N,; is bounded.) 


As Markov chains the symmetric random walks have some special 
properties, reflecting corresponding special properties of Brownian 
motion. For instance, œ is a constant for the random walk, and 
P = PT. Consequently P = Ê. We shall see that although many 
results of classical potential theory generalize to all transient and most 
recurrent chains, some will require further assumptions which happen 
to be true for symmetric random walks. 


CHAPTER 8 


TRANSIENT POTENTIAL THEORY 


1. Potentials 


In this chapter P denotes a Markov chain all of whose states are 
transient—that is, a transient chain with no absorbing states. Every 
such chain has at least one (strictly) positive superregular measure, as 
we saw in Chapter 5; for example, the sum of 2~‘ times the ith row of 
N is such a measure. 

We select one such positive superregular measure, to be fixed 
throughout the chapter, and call it «. All of transient potential theory 
will be relative to the distinguished vector «. 

Let P be the a-dual of P. Since all states are transient in P and 
since P = P, we see that P is the most general chain of the type we 
consider. The distinguished measure for Ê is taken to be the same a. 

As an example, let P be a transient Markov chain whose states 
communicate. Then P1 <1 and 0 < N = g-o P" < œ. Every 
non-negative non-zero superregular row vector ŝ is positive, for if 
B; # 0, then for every state i and integer k 


Bi = (BP*), = > BnPR = BPH. 


The right side must be positive for some k, since j communicates with 2. 
Thus in this special case any non-negative superregular row vector may 
be taken as a; in particular, œ may be taken as a row of N. 

In the general case, if P1 + 1, we have defined the enlarged chain P 
by adding an absorbing state a to P and by setting 


P, Py ifi #aandj 4a 
Pea l= > Pu 

k 
Pay = Sa; 


If P1 = 1, we shall agree that P is its own enlarged chain. 
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It will be convenient to say that the product of a row vector and a 
column vector is finite when we mean that it is well defined and finite. 
We recall the terminology function for column vector and signed 
measure for row vector. 


Definition 8-1: If » is a signed measure with ,1 finite and if 
v= lim [p(Z +P. P”-1)] 


exists and is finite-valued, then p is called a left charge with potential 
measure v. If fis a function with af finite and if 


g = lim[(I + P +... + P""*)f] 


exists and is finite-valued, then f is called a right charge with potential 
function g. In either case a pure potential is a potential of a non- 
negative charge. 


The condition that af be finite is the natural analog of the classical 
theory as described in Chapter 7. It states that f is integrable with 
respect to the distinguished superregular measure «œ. Similarly, the 
condition on p is that the distinguished superregular function 1 be 
integrable with respect to p. 

Potential functions have a simple probabilistic interpretation in terms 
of games. If f denotes a payment function in which f; is the payment 
a player receives each time he is in state j, then P"f denotes the expected 
payment on the nth step. Thus (J + P +---+ P"~1)f is the total 
payment before time n, and the potential g is the expected total pay- 
ment in the long run. It is clear intuitively that g, should equal 
>; Ni,f;, and we now prove this result. 


Lemma 8-2: If u1 is finite, then uN is finite-valued. If af is finite, 
then Jf is finite-valued. 


Proor: We have 
Ny = Hp Ny < Nj 


33S 


Thus 
\(uN);| < > \ui|Ni; < 2, jul Ny = (Qel1)N;; < œ. 


For the second half let u = dual f and apply the first result to Ê, 
noting that u1 = af. Then 


oo > uN; = | > fioalesN alo) = 


Since a; # 0, Nf is finite-valued. 


= a,|Nf\;. 


0D, Naf 
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Theorem 8-3: If af is finite, then f is a charge and its potential is 
g = Nf. 


Proor: By Lemma 8-2, Nf is well defined and finite-valued. Hence 
so are both Nf*+ and Nf~. By monotone convergence, 


lim [(J + P+---+ Pr-1)f+] = Nft 


and 
lim [(I + P +-+ Pr-1)f-] Z Nf-. 


Thus 
lim [(Z + P +---+ P""1)f] = Nft — Nf- = Nf. 


Thus f is a charge if and only if it is integrable with respect to «, and 
N is the potential operator that transforms a charge into its potential. 
In particular, f is a charge for P if and only if it is a charge for Ê. 
We shall now show that J — P is the inverse operator. 


Theorem 8-4: If g is a potential, then (I — P)g is its charge. 


Proor: Let f be a charge with potential g. By Theorem 8-3, g = Nf. 
Hence, by Lemma 5-9, (I — P)g = f. 


Therefore, there is a one-to-one correspondence between charges and 
potentials. Note that Theorem 8-4 implies that a potential is regular 
at all states where the charge is zero. 

The method used to derive the second half of Lemma 8-2 is of general 
importance. We prove a result for all our P’s for signed measures 
(or functions). We apply the result to Ê and obtain a corresponding 
result for functions (or signed measures). Then since Ê is the form of 
the most general transient chain being considered, the new result holds 
for all P’s. Such results will loosely be described as duals. 

The duals of Theorems 8-3 and 8-4 state that a signed measure p is a 
charge if and only if p1 is finite. Its potential is v = uN, and 
u = v(I — P). 

From now on we shall prove theorems only for functions; the dual 
results for signed measures can always be proved by the indicated 
method. The key to the success of the method is that the dual of a 
right charge for P is a left charge for Ê, and the dual of a potential 
function for P is a potential measure for Ê. 

From Theorem 8-3 we see immediately that the class of potentials is 
quite extensive. We can even prove that there exists a strictly positive 
pure potential, a result we shall need later on. 
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Proposition 8-5: There exists a strictly positive pure potential. 


Proor: Number the states and let f; = 2~4/a,. Then 
af = > «(27 4/0%5) = I, 
7 


so that f is the charge of a pure potential g, by Theorem 8-3. Further- 
more, 


g = > Nof > Nafı = fi > 9, 
j 
so that g is strictly positive. 


For many purposes it is sufficient in studying potentials to consider 
only pure potentials. The reason for this simplification is the following. 


Proposition 8-6: Any potential may be represented as the difference 
of two pure potentials. 


Proor: Write g = Nf = Nft — Nf-. 


Note by Theorem 8-4 that a potential is superregular if and only if 
it is a pure potential. 

We recall from Theorem 5-10 that a non-negative superregular 
function h is uniquely representable as h = Nf + r with r regular. 
In the representation, r = lim, P”h > O and f= (I — P)h > 0. 
(The dual of this result allows the unique representation of a non- 
negative superregular measure m as m = uN + p with p regular. In 
this representation, p = lim, nP” > Oandu = 7(I — P).) This result 
is the analog of a classical theorem due to F. Riesz: In any open set of 
Euclidean space which corresponds to a transient version of Brownian 
motion, any non-negative superharmonic function is uniquely the sum 
of a pure potential for the region and a non-negative harmonic function. 
The pure potential may have infinite total charge. We now generalize 
the Markov chain result, and in so doing we obtain a useful necessary 
condition that potentials must satisfy. 


Proposition 8-7: If (Z — P)h = f, if h is finite-valued, and if Nf is 
finite-valued, then h has a representation in the form 


h= Nf+r 


with r regular. The vector r satisfies r = lim, P”h. 
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Proor: Set r = h — Nf. Then 
(I — P)(h — Nf) = (I — P)h — (I — P)(Nf) 
= (I — P)h — f by Lemma 5-9 
=f-f=0. 


Hence r is regular. Now 
f=(1-P)h=h-— Ph 


or 
h= Ph + f. 


Since Nf is finite-valued and since P"f* < Nft and P"f- < NFT, 
Pf is finite-valued for every n. By induction we see that 
Peat his Peles pete 
and that P*h is finite-valued. Summing for k = 1,..., n, we obtain 
h=Ph+ (I+ P+---+ Pr-vf. 
By dominated convergence the second term tends to Nf. Hence 
h = lim P*h + Nf. 


Corollary 8-8: If (J — P)h = f, if h is finite-valued, and if af is finite, 
then A is a potential if and only if lim, P"h = 0. 


Proor: By Theorem 8-3, f is a charge and Af is its potential. Apply 
Proposition 8-7 and write 


h = Nf + lim Ph. 


Iflim P"h = 0, then his the potential off. Conversely, if lim P"h Æ 0, 
then A cannot be a potential because, by Theorem 8-4, it would have to 
have f as its charge. 


Corollary 8-9: If g is a potential, then lim, P"g = 0. 
Proor: Take h = g in Corollary 8-8. 


In the discrete analog of the classical case—three-dimensional 
symmetric random walk—every potential g is bounded and satisfies 
lim; g; = 0. In our theory we obtain only the weaker result, Corollary 
8-9; that g may be unbounded will be shown in Section 7. 

The stronger results of the classical theory are due to special features, 
as the next proposition shows. In the classical case a is chosen as 17 


196 Transient potential theory 


and N,, is independent of i. Hence, «, > kN, for some positive 
constant k. Furthermore, lim, N, = 0 by Proposition 7-10. 


Proposition 8-10: If « is chosen so that a, > kN, for all 1 with k a 
positive constant, then all potentials are bounded. If, in addition, 
lim, N; = 0, then lim, 9; = 0. 


Proor: The dual of N,, < Nj, is Ny < (Nu/o)a; Ife, > kNy, then 


N 1 
|a] < > Nulh < —>. lhl < I2 a| fl- 
7 hl 7 


Thus g is bounded. Now suppose lim, N,; = 0. By Proposition 8-6 
we may assume that g is a pure potential. Define a sequence of func- 
tions h? = N/a; and a measure u; = af; Then yp isa finite measure, 
and 


is bounded independently of i and j. Hence, by dominated con- 
vergence, 
lim g; = lim > hp; = > (lim N,,)f; = 0. 
i i j i 


j 


Both conditions of Proposition 8-10 hold for the basic example with a 
chosen as 8. However, only the first condition holds for the reverse of 
the basic example (see Section 6). In Section 7 we shall see an example 
where both conditions fail. 


2. The h-process and some applications 


Duality is a transformation which interchanges the roles of row and 
column vectors. Our purpose now is to describe a useful transformation 
of transient chains into new transient chains in which row and column 
vectors are transformed into vectors of the same type. 


Definition 8-11: Let h be a positive finite-valued superregular 
function for a transient chain P. The h-process is a Markov chain P* 
with transition probabilities 


It is left to the reader to verify that P* is a transition matrix, that all 
states are transient, and that if the states of P communicate, the same 
is true for P*. Let U be a diagonal matrix with diagonal entries 
1/h, Then P* = UPU™?. 
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Definition 8-12: The h-process transformation is a transformation 
defined on square matrices Y, row vectors 7, and column vectors f by 


Y* = UYU~-! 
n* = nU! 
jt = Uf. 


The h-process transformation yields results similar to those with 
duality. If P is a transient chain, then P* also is transient. More- 
over, powers of P transform to the P* process the same way that P 
does, and the fundamental matrix for P* is N*. Sums and products 
are preserved in their given order; and equalities, inequalities, and 
limits are preserved entry-by-entry. Any superregular function (or 
signed measure) for P transforms into a superregular function (or signed 
measure) for P*. 

If « is the distinguished superregular measure for P, we select a* as 
the distinguished superregular measure for P*. Then a*f* = af and 
N*f* = (Nf)*. Hence if fis a right charge with potential g in P, then 
f* is a right charge for P* with potential g*. 

If we decompose P as 


E Š 
E/T U 
P= 
alee) 


I 0 
BE = 
fs Q")R ,) 


From this decomposition we see that (B¥)* is the BE-matrix of the P*- 
chain because Q and R transform into Q* and R*, because products are 
preserved, and because I* = I. 

We shall now give some applications of the h-process. 


then 


Definition 8-13: The support of a charge is the set on which the charge 
is not 0; the support of a potential is the support of its charge. A 
charge or potential is said to have support in Æ if its support is a subset 
of E. 


The function 1 is always superregular, and hence by the representa- 
tion theorem 1 = Nf + r, where f = (I — P)1 and ris regular. That 
is, 


fi= 1 — (P1) = Pa 
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in the enlarged chain. Moreover, 
(Nf), = > Ni;P ja = Bua. 
J 


Thus r; = 1 — B; is the probability that the P-process, started at i, 
continues indefinitely. The enlarged chain is absorbing if and only if 
1 = Nf. 


Proposition 8-14: Let g > 0 be a pure potential with support in Æ, 


and let P* be the h-process for h = g. Then g* = 1 is a potential, P* 
is absorbing, and (B*)*1 = 1. 


Proor: Potentials transform into potentials; hence g* =1 is a 


potential. Since 1 is then of the form N*f*, P* is absorbing. The 
absorbing state a can be reached only from a state i such that (P*1),; < 1; 
for such a state t, 


0 < [Ul — PA] = [U — P*g*) = fë. 


Hence f, > O0andi must bein Æ. Thus the P*-process with probability 
one reaches E from all states, and (B¥)*1 = 1. 


What underlies Proposition 8-14 is this: The h-process tends to 
follow paths along which h is large. But since potentials tend to zero 
on the average (P"h — 0 for a potential), if h is a potential, then the 
paths in the h-process disappear. See Chapter 10 for details. 


Proposition 8-15: If h is a non-negative finite-valued superregular 
function, then B*h < h for any set E. 


Proor: First suppose that h > 0. Form the hA-process; then 
h* = 1. Since (B*)*1 < 1, we have B¥h < h. (The conclusion that 
an inequality for the h-process implies an inequality for the original 
process is one we shall draw frequently. If it were false, then the 
inequality B®h < h would fail in some entry. But the h-process 
transformation preserves inequalities entry-by-entry.) 

Now suppose that h has some zero entries. Apply the special case 
above to the function h + «1. Then 


Beh +l) < hte. 


Letting e tend to zero, we obtain Bh < h. 


Proposition 8-16: If h is a non-negative superregular function and if 
E is any set of states, then h = B*h satisfies the following: 


(1) h < hand hg = hy. 
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(2) h is the pointwise infimum of all non-negative superregular 
functions which dominate h on Æ. 
(3) Iff = (I — P)h, then 
fe = (I — P*)hg = 0 
and i 
fa = 0. 
Therefore, h is regular on Ẹ and is superregular everywhere. 
(4) If E C F, then Beh < BFh. 


Proor: Statement (1) follows from Proposition 8-15 and the fact that 
BE = 6, for i in Æ. For (2) let x be a non-negative superregular 
function such that xg > hp. Since the Ẹ columns of B® are zero, 


x > BEx > BEh =h, 


and (2) holds provided we show in (3) that his superregular. For (3), 
since Ph < Ph < his finite-valued, 


i I— PE fhg 
f = (I — P)(B*h) = [(I — P)B?]h = ( \ } 
0 0/ \he 


But hp is P¥-superregular by the dual of Lemma 6-7, and (3) follows. 
Finally for (4), if # C F, then by conclusion (6) of Proposition 5-8, 


BEh = B¥(BEh) < BFh. 

We now prove two lemmas and a proposition which conclude that a 
charge and its potential may both be computed from a knowledge of 
the values of the potential on the support. The first lemma is interest- 
ing in itself because of its game interpretation, which we shall discuss 
after proving the result. 

Lemma 8-17: For any set of states Æ, 

N = BEN + =N. 
If g is a potential with charge f, then 
g = B¥g + =Nf. 
Proor: In Theorem 4-11, let f, be the number of times in j when and 


after E is reached (or 0 if E is not reached), and let t be the time when 
E is reached (or + œ if # is not reached). Then Theorem 4-11 yields 


Mif;] = > Prix, = k] M,{n,] 


= 2, BiN pj. 


200 Transient potential theory 


But f, is the difference of the total number of times the process is in j 
and the number of times it is in j before reaching Æ. Hence 


M£] = N; _ FN, 


and the first equation follows. To get the second equation we multiply 
through by f; associativity in B® Nf holds because BEN|f| is finite- 
valued. 


In the game interpretation of potentials, f, is a payment received 
each time the process is in state j, and g is the expected total gain if the 
process is started ini. The second equation of Lemma 8-17 states that 
g is the expected gain when and after E is entered plus the expected 
gain before Æ is entered. If all the payments are non-negative, then it 
is obvious from this interpretation that g => Bg. If the support of f 
is in Æ, then all non-zero payments occur in Æ, and the expected gain 
before reaching Æ is zero. Hence, as we shall see formally in Proposition 
8-19, g = B®. 


Lemma 8-18: The fundamental matrix for PF is Np. 


Proor: The assertion is probabilistically clear because the number of 
times the process P is in a state of E when watched only in states of E 
is the same as the number of times the process P is in a state of E. 


Proposition 8-19: If g is a potential with support in Æ, then gps 
determines g, g = B¥g, fe = (I — P*®)g;, and gg = Nef;. 


Proor: The fact that g = B?g is immediate from Lemma 8-17. 
Hence g; determines g. Since g = B*g, we have fs = (I — P¥)g,z by 
conclusion (3) of Proposition 8-16. Finally ge = N pfp either by Lemma 
8-18 or by direct calculation: 


(=) 7 m lie a hes 
ge Na N,/\0 Nafe 
Next we shall prove that the columns of B? are always potentials. 


Proposition 8-20: For any set of states Æ the columns of B? are 
potentials with support in Æ, and 


I- PE 0 
Be = af ) 
0 0 
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Proor: Since 


I- PF 0 
o( ) = (a,(I — P*) 0) 
0 0 


I- PE 0 
is finite-valued, each column of N ( l is a potential with 
0 


support in Æ. Thus by Proposition 8-19, 


I- PF 0 I- PF 0 
Ho fs a 
0 0 0 0 
N,(I — PE) 0 Ng N 
-| H ; ) vaw- ‘ 
N — P?) 0 N, N, 
I 0 
= e ) by Lemma 8-18 
N,(I — PE) 0 


= BE, 
Corollary 8-21: For any set E, lim, P”BE = 0. 
Proor: Apply Proposition 8-20 and Corollary 8-9. 


Finally we work toward a proof that a non-negative superregular 
function dominated by a potential is a potential, a result we state as 
Proposition 8-25. 


Lemma 8-22: If Z is a finite set and if h is non-negative superregular, 
then BFh is a pure potential of finite support. 


Proor: BFh is a finite linear combination of columns of B® and is 
therefore by Proposition 8-20 a potential with support in the finite set 
E. Since h is non-negative superregular, B¥h is non-negative super- 
regular by conclusion (3) of Proposition 8-16. Hence, Bh is a pure 
potential. 


Proposition 8-23: Every non-negative superregular function is the 
limit of an increasing sequence of pure potentials of finite support. 


Proor: Let E, C E, C E, C--- be an increasing sequence of finite 
sets with union the set of all states S, and let h™ = BEnrh. Then 
h™ is an increasing sequence of pure potentials of finite support by 
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Lemma 8-22 and conclusion (4) of Proposition 8-16. If 7 is in E,, 
then hi? = h,, so that lim h™ = h. 


If g is a potential with charge f, then the total charge of g (or f) is 
defined to be af (see Section 7-5). 

Lemma 8-24: If Nf < Nf with f > 0 and af finite, then 

0 < af < af < œ. 

Proor: By the dual of Proposition 8-23, we may find a sequence of 
finite measures 7” such that « is the monotone limit of rN. Since 
Nf < Nf, we have m™(Nf) < 7™(Nf) and 

lim 7™(Nf) < lim 7™ (Nf). 
Since f > 0, 
(NÎ) = (Nf, 
and 
lim r™(Nf) = (lim r®N)f = af 
by monotone convergence with f as the measure. And since n™Nft < 
aft < oandr™Nf- < af~ < œ, 


n™®(Nf) = aON St KZ TONG. 


Hence 

lim r®(Nf) = af* — af- = af 
by monotone convergence for each term. Thus af < af; af > 0 since 
f = 0, and af < œ by hypothesis. 


Proposition 8-25: If h is a non-negative superregular function 
dominated by a potential g, then h is a potential and its total charge is 
no greater than the total charge of g. 


Proor: Let g = Nf. Write h = Nf + lim P*h with f = 0. Since 
0 < h < g, we have 0 < P"h < Pg. But Pg + 0 by Corollary 8-9, 
so that P"h > 0 and h = Nf. Since «|f| < 00, we have, by Lemma 
8-24; 0 < af < af < œ. Hence his a potential and af < af. 


Corollary 8-26: A non-negative potential g = Nf has non-negative 
total charge. 


Proor: Letg = Nf > 0,andsetf = 0. Since af is finite, af > af =0 
by Lemma 8-24. 
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Proposition 8-25 has an interesting interpretation in terms of the 
enlarged chain. The jth column of N is a potential with charge 
{ô}. If the infimum of the column is positive, then the column 
dominates a constant function k1, so that 1 is a potential by Proposition 
8-25. Hence, the infimum of every column of N is zero unless the 
extended chain is absorbing. In particular, if P1 = 1, then the 
infimum of every column of N is 0. 

In the case of the symmetric random walk in three dimensions, 
P1 = 1. Thus the infimum of every column of N is zero, and since P 
is symmetric, the infimum of every row is zero. This fact, although 
not providing a proof of Proposition 7-10, does give us more insight into 
that result. 


3. Equilibrium sets and capacities 


In proving analogs in the next section to the classical potential 
principles, we shall need to restrict the supports of the charges involved. 
The notion we shall need is that of an equilibrium set. 


Definition 8-27: A set Æ is an equilibrium set for P if there is a pure 
potential which assumes the value 1 at every point of Ẹ and which has 
support in E. Such a potential is called an equilibrium potential for Æ. 
A set E is a dual equilibrium set for P if there is a pure potential measure 
with support in E which equals « on Æ. 


We proceed to give two characterizations of equilibrium potentials. 


Proposition 8-28: A set Æ is an equilibrium set if and only if both 


(1) œe? < œ and 
(2) for any starting distribution the set H is entered only finitely 
often a.e. 


When Æ is an equilibrium set, the hitting vector kë is the unique 
equilibrium potential and its charge is the escape vector eë. 


Proor: Suppose Æ is an equilibrium set. If x is an equilibrium 
potential for Æ, then B¥x = x by Proposition 8-19 and x; = 1 by 
definition. Since x; does not affect the value of B¥x, we have 


x = Bix = B51 = RF, 
and h* must be the equilibrium potential. Its charge is 


(I — P)hE = e 
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by Theorem 8-4 and Proposition 5-8. Thus Æ is an equilibrium set if 
and only if A is a potential or if and only if «eë < œ and lim P"h? = 0 
But s£ = lim P*h¥ by conclusion (7) of Proposition 5-8. 


Corollary 8-29: All finite sets are equilibrium sets. 
Proor: Apply Propositions 4-28 and 8-28. 


Proposition 8-30: If Æ is an equilibrium set, then the equilibrium 
potential is the pointwise infimum of all pure potentials which dominate 
1 on E. 


Proor: Since hë = B¥1, the result follows from conclusion (2) of 
Proposition 8-16. 


We shall use the notation yf for dual eë. 


Definition 8-31: If E is an equilibrium set, the capacity of Æ is defined 
by C(Z) = «eë = "1. 


In terms of total charge, Definition 8-31 states that the capacity of 
an equilibrium set is to be the total charge of the equilibrium potential. 


Lemma 8-32: A set E is an equilibrium set if and only if both 
(P¥)"1 + 0 and a,[(J — P*)1] < œ. If Eis an equilibrium set, then 
C(E) = a{(I — PEY] = [apf — PPN. 


Proor: We shall apply Proposition 8-28. [(P*)"1], is the probability 
starting in i €e E of returning to Æ at least n times. Thus (P*)"1 > 0 
is a necessary and sufficient condition for being in Æ only finitely often 
a.e. for any starting distribution. Secondly (J — PF) = e and 
ape = ae®, Hence «eë is finite if and only if «,[(f — P*)1] < œ. 
And if E is an equilibrium set, then 


C(E) = a,[(I — P*)1]. 
Under duality a number is transformed into itsélf. Hence 
C(E) = [(dual 1)(dual (J — P*))](dual æg) 
= [es — PM. 
Proposition 8-33: Æ is an equilibrium set if and only if 1 is a potential 


for PE with a; as the distinguished measure. Also C(#) is the same 
computed for P as for PF. 
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Proor: 1 is always superregular for PF. The two conditions given 
by Lemma 8-32 are precisely the conditions that 1 be a potential. Also 
a,{(I — PEJ] is the capacity of E in P”. 


Proposition 8-34: E is an equilibrium set for Ê if and only if it is a 
dual equilibrium set for P. When Æ is such a set, ag = 42N p. 


Proor: E is an equilibrium set for Ê if and only if 1 is a potential for 
PE with 1 = N,é£. By duality, this condition is equivalent to the 
assertion that «p is a potential measure for PF with a, = 7ZN,. The 
result then follows from the dual of Proposition 8-33. 


We would like the result that C(£) = Ĉ(E). However, an equilibrium 
set for P need not be an equilibrium set for Ê (see Section 6). There- 
fore, the following is the best possible result: 


Proposition 8-35: If E is an equilibrium set for both P and P, then 
C(E) = C(E). 


Proor: By Proposition 8-34, we have a; = #2N,, so that 72 = 
a,(I — PE) by Lemma 8-18. By Lemma 8-32 applied to Ê, 
C(E) = areg = (FEN eee = ÑEN rek) = Hehe 
= 481 = [a,(I — PF) = C(E). 


Proposition 8-36: If F is a dual equilibrium set and Æ C F, then Æ is 
a dual equilibrium set and (E) = #°h*. 


Proor: We shall use Proposition 8-28 to prove that # is a dual 
equilibrium set. By Proposition 8-34, «œp = (#?N), and C(F) = 
471 < œ. Hence az = (4’N),; and 


I- PE 0 
att = Pa = [aa 
= (4° B®), 
by Proposition 8-20. Then 
«éE = ag{(I — P*)1] = [esl — PF) = (4 B*)g1 = 4° B?1 = GFhe. 


Since #°h < #71 < œ, we have just verified the first condition of 
Proposition 8-28—that «éF < œ. The second condition is trivial for a 
subset of an equilibrium set. Hence Æ is a dual equilibrium set and 
O(E) = aé® = GFR. 
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The dual of this result states that any subset E of an equilibrium set 
F is an equilibrium set, and C(E) = 7Fh®. 


Proposition 8-37: The union of a finite number of equilibrium sets is 
an equilibrium set. 


Proor: Let E, ..., E, be equilibrium sets and let E = UR_, Ep 


Then 
n 
> aef < > > aef 
ieE k=1ieE, 
n 
<$ Y wh 
k=1 ieE, 


> C(E,) < ©, 
k=1 


and if the process is in each E, only finitely often a.e., then it is in Æ 
only finitely often a.e. Hence, by Proposition 8-28, # is an equilibrium 
set. 


Some of the classical results hold only if the support of a potential is 
a reasonably small set. It will always be satisfactory to have a finite 
set as support. A more general assumption is that the support is an 
equilibrium set. Since equilibrium sets include all finite sets, since a 
subset of an equilibrium set is an equilibrium set, and since finite 
unions of equilibrium sets are equilibrium sets, we may think of 
equilibrium sets as a class of “reasonably small” sets. 

Choquet has introduced a generalized notion of capacity. In our 
case his definition takes the following form. 


Definition 8-38: A Choquet capacity is a non-negative monotone 


increasing set function such that, for any sets A,, Ao,..., A,, 
C(A NANN An) < > C(A) — > CALA) 
i i#j 
+ > C(A U A,U A,) 
i#g#k 


=- Sd AOA Ce Or, YE 


A simple way of constructing one of these capacities is to let 7 be a 
fixed starting distribution and to take C(Z) to be the probability of ever 
entering E. That is, C(E) = 7h®. This set function is monotone 
because kE is. The right side of the inequality in the definition of 
capacity is the probability that all sets are entered. The left side is the 
probability that the intersection of the sets is entered, which is one way 
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of entering all sets, though in general not the only way. Hence 
Choquet’s definition is satisfied. Since we may clearly also replace 
C(E) by kC(E) with k > 0, m may be any non-negative finite measure. 

We shall show from this construction that our Definition 8-31 yields 
a Choquet capacity on equilibrium sets. For convenience, we will give 
a proof for Ê. For any fixed equilibrium set F of Ê, Proposition 8-36 
tells us that the capacity of any subset Æ is 7*h®. Hence the above 
argument applies with m = 7°. Thus the Choquet conditions hold for 
all subsets of F and, since F is any equilibrium set, they hold for all 
equilibrium sets. 

A more general method of obtaining a Choquet capacity within our 
framework is as follows. Let m be a measure, and let h be a strictly 
positive superregular function such that mh < œ. Define C(E) = 
aB*h. Forming the h-process, we see that C(E) = 7*(B¥)*1 = 1*(h¥)* 
with 7*1 = mh < œ. Thus by the special case C(#) is a Choquet 
capacity for the h-process and hence satisfies the same axioms in the 
original chain. Moreover, we see that the situation in the earlier case 
is just the present case with h = 1. On the other hand, this more 
general method includes a second interesting case: If g is a pure 
potential for which zg is finite, then by Lemma 8-17, 

awB*g = ng — n ¥Nf; 
a B¥g is a Choquet capacity which assumes its maximum value zg on all 
sets Ẹ containing the support of f. If 71 = 1, then 7B¥g in the game 
interpretation is the expected gain when and after E is reached. 

Definition 8-31 is reasonable only for equilibrium sets, since otherwise 
it is possible to have eë = 0. We could instead restrict the definition 
to finite sets and define the capacity of an infinite set as the supremum 
of the capacities of its finite subsets. We will show, under an additional 
assumption, that this new approach agrees with Definition 8-31 on 
equilibrium sets and assigns infinite capacity to all other sets. 


Proposition 8-39: If Ñ has columns which tend to zero, then a set E 
for which the supremum of the capacities of its finite subsets is finite is 
an equilibrium set, and the supremum is the capacity of the set. 


Proor: Let E, C E, C... be an increasing sequence of finite sets 
whose union is E. We must prove that if sup C(Z,) is finite, then AF 
is a potential and C(E) = supC(E,). First we note that kē is the 
monotone limit of ha. IfieH,, then the ith component of e¥ de- 
creases for n > m. Thus lime’ = @ exists. Since Ñ has columns 
that tend to zero and since 7¥.1 = C(H,) < sup C(E,) < œ, 


nin N —> (dual 2)N 
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by Proposition 1-58. By duality, 

hEr = Ne®n — Ne, 
and 


aé = (dual é)1 = > lim nfr, 
i n 


which by Fatou’s Theorem is 


< lim (n71) = lim C(E,„) = sup C(Z,). 


Hence hë = Nē and aé < œ. Thus Æ is an equilibrium set. Also 
C(E) = «è < sup C(E,). 
But C(£,) < C(£) for every n, so that sup C(#,) < C(#). Thus 
C(E) = sup C(E£,). 


The converse, that an equilibrium set has the property that the 
supremum of the capacities of its finite subsets is finite, follows trivially 
from the monotonicity and the finiteness of capacity on equilibrium 
sets. 


4. Potential principles 


We shall now derive analogs of several of the fundamental theorems 
of classical potential theory. The first is the solution to the Dirichlet 
problem; in the uniqueness statement we shall need a lemma, for which 
we shall give two proofs. 


Lemma 8-40: If P is an absorbing chain, then P has no bounded 
non-zero regular function. 


Proor 1: Suppose Ph = h with |h| < c1. Since h, = >, P,,;h;, we 
have |h,| < >; Pylh;| or |h} < P|h|. Therefore, |h| < P|h| < P*(c1). 
But P"1 is the probability that the process continues at least until time 
n, which tends to zero as n tends to infinity because P is absorbing. 
Hence h = 0. 


Proor 2: Let a be the absorbing state of P and let t = a be the time 
to absorption; t is a stopping time since P is absorbing. If h = Ph, 
set 
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Then h’(x,(w)) in the P-process forms a bounded martingale. By 
Corollary 3-16 to the second martingale systems theorem, 


0 = h, = M[A(x,(w))] = M[A'(x(w))]. 
But x(w) is arbitrary, so that h’ = Oandh = 0. 


The method in the second proof is of some importance, and we shall 
meet it again later. 


Theorem 8-41: Let E be an arbitrary set of states, and suppose that 
EP, the chain P with E made absorbing, is an absorbing chain. If hg 
is any bounded function defined on #, then there exists a unique 
bounded function h whose restriction to Æ is hy and which is regular 
on £. The function is 

z hg 
h = BE ; 
0 


= a(t), 
0 


The product is defined, since hg is bounded and BF has row sums one. 
Then the restriction of h to E is hg because (BE), = ô for i and j in Æ. 
Moreover, 


E he\ [I — PE fhg (I — P®)hg 
creo pet) M) (2) 


Proor: For existence, set 


0 0 0/\ 0 0 


so that h is regular outside of E; associativity is justified in the triple 


h 
product because (I + ya ‘ is finite-valued. 
0 


(9 


be another such bounded function. Then % — k is a bounded function 
which is zero on Æ and regular outside Æ. If Q is the transition matrix 
for the transient states of £P, then (h — k), is a non-zero bounded Q- 
regular function, in contradiction to Lemma 8-40, since Qis absorbing. 


For uniqueness, let 


Next we prove the Maximum Principle. 
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Theorem 8-42: Let E be an arbitrary set of states and suppose that h 
is a finite-valued function such that h = B®h. Then the supremum of 
the values of h is equal to the supremum of the values of h on E. If 
the states of # communicate in £P, if £P is absorbing, and if h assumes 
its maximum on &£, then h is constant on Ë. 


Remark: Corresponding results hold for infima’by replacing h by — h. 


PROOF: 
h, = > Boh; 
jeE 
< BE h;\ < sup hj. 
2,28 (ean) S Sup hy 


Suppose that h assumes its maximum on Ẹ, that £P is absorbing, and 
that the states of Ẹ communicate in =P. Let i be a state where the 
maximum is assumed, and let k be any state of E that can be reached 
in ZP from Ẹ. Since the transient states of =P communicate, we have 
BE. > 0. Moreover, 


h, = Bah, + > BER; 


J#K 


BE, +h, > BẸ since h; < h 
iFk 


lA 


BEA, + h(i — BE) since B51 = 1 
h, — Bislhi — hy). 


Therefore, h; = h, for all such k. Then for any me, BE, > 0 
precisely for those j for which BE > 0, and h; = h, for those j. Thus 


hn = > Bzh; = > Bishi = h, 


jEE jeE 


Corollary 8-43: If g is a potential with support in a finite set, then g 
is bounded. 


Proor: Since g = B¥g for any potential, we may apply Theorem 
8-42. The supremum in Æ is over a finite number of values. 


Corollary 8-44: Let E be an arbitrary set of states, and suppose that 


(1) EP is an absorbing chain. 
(2) the states of Å communicate in 7P. 
(3) every state of E can be reached in £P from Ñ. 
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If h is a bounded function regular outside of Æ, then h cannot assume its 
maximum on Ñ unless h is constant everywhere. 


Proor: By Theorem 8-41, h is the unique solution to the Dirichlet 
problem for the function hz. Hence 


h = a(o), 
0 
Multiplying through by B® and applying Proposition 5-8, we have 


h, h 
Bh = Bee | = e ”) =h. 
0 0 


By Theorem 8-42, h is constant on Ẹ. As shown in the proof of that 
theorem, h assumes the same constant value at every state of E which 
can be reached in £P from Ë. 


The result that follows is the Principle of Domination. 


Theorem 8-45: Let h be a finite-valued non-negative superregular 
function, and let g = Nf be a potential. If k dominates g on the 
support of f*, then h dominates g everywhere. If, in addition, h is a 
potential Nf, then af < af. 


Proor: If g is a pure potential supported in Æ, then g = B¥g by 
Proposition 8-19. But by Proposition 8-16, B¥g is the pointwise 
infimum of all non-negative superregular functions which dominate g 
on E. Thus the first half is proved if g is a pure potential. For arbi- 
trary g, write g = Nft — Nf. We have Nf*+ — Nf- < h on the 
support of f+, so that 


Nft <h + Nf- 


on the support of f*. Applying the special case to the superregular 
function h + Nf- and the potential Nf*, we have 


Nft <h+MNf- or gsh 
everywhere. Finally, if h = Nf, then 
Nfs NGF +S) 
implies 
oft < affo) 
by Lemma 8-24. Hence af < af. 
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Next we prove the Principle of Balayage. 


Theorem 8-46: If g is a pure potential and if E is any set of states, 
then there is a unique pure potential g with support in E such that 
g=gonZ£. The potential g satisfies g < g everywhere, and its total 
charge does not exceed the total charge of g. 


Proor: For existence, let g = B¥g. Then ĝ < g, fe = Je, and g 
is superregular by conclusions (1) and (3) of Proposition 8-16. By 
Proposition 8-25, g is a potential, and the total charge of g is less than or 
equal to that of g. 

For uniqueness, if h were another such potential, we would have 


g = Bg = Bh=h 
by Proposition 8-15 and the fact that 9g, = gz = hy. 


If E is any set and if g is a pure potential, we refer to the potential g 
of Theorem 8-46 as the balayage potential of g on E. 


Corollary 8-47: The balayage potential g = B¥g of g on E is the 
pointwise infimum of all pure potentials which dominate g on Æ. 


Proor: Apply conclusion (2) of Proposition 8-16. 


Corollary 8-48: The balayage potential of g on Æ is the supremum of 
all pure potentials with support in Æ which are dominated by g on Æ. 


Proor: Certainly the balayage potential does have the stated 
property. Thus let g = B*g and let h be a potential with support in 
E and with hg < gz. Then by Proposition 8-19, h = B¥h and 


g = B¥g > BEh = h. 
If g has support in Æ, then g itself is the balayage potential of g on E. 
In particular, h? is the balayage potential of h¥ on Æ for E an equilibrium 


set. 
Next we prove the Principle of Lower Envelope. 


Lemma 8-49: The pointwise infimum of non-negative superregular 
functions is non-negative superregular. 


ProoF: It is clearly non-negative. If 


hg = Phs 


8-51 Potential principles 213 
for all £, then 
Pi(inf hg) < Phg < hg 


for all £, so that 
P(inf hg) < inf he. 


Theorem 8-50: The pointwise infimum of pure potentials is a pure 
potential. 


Proor: Apply Lemma 8-49 and Proposition 8-25. 


Finally we prove the Principle of Condensers. We are to think of 
two sets E and F as the two plates of a condenser with a positive charge 
placed on Æ and a negative charge placed on F in such a way as to 
produce a unit voltage drop. Since in equilibrium there should be a 
uniform voltage on each plate and since the 0-value of voltage is 
arbitrary, we will require that the potential be 1 on # and 0 on F. 
The theorem is proved for an equilibrium set Æ with finite boundary Ẹ, 
that is, a set Æ that can be entered or left only through the finite 
set E. 


Theorem 8-51: Let Æ be an equilibrium set with finite boundary, and 
let F be any disjoint set of states. Then there is a potential g = Nf 
which is 1 on Æ and 0 on F and which is such that f* has support in £, 
f- has support in F, and af > 0. 


Proor: Let gi = Hp, the probability starting at i that E is reached 
before F. Clearly, gis 1 on E and Oon F. Furthermore, 0 < g < hF. 
Since P"h® — 0 for the equilibrium set E, we have P"g — 0. 

Let f = (I — P)g. We are going to apply Corollary 8-8 to conclude 
that g is a potential with charge f, but to do so we must show that af 
is finite. If is in the complement of E U F, then g; = (Pg); hence f 
has support in E U F. Write 


E F 

E g Y 
PEYE = fi 
F\Z a 


Noting that if ie E U F, then (Pg); is the probability that the next 
entry to H U F is in FZ, we have 


-fe (Pg); = 1 — (Pg), = 1 — (X1) ifieH 
0 — (Pg) = - (Z1) if ie F. 


i 
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1—X1 on E. 
feor Fs 


Hence 


—Z1 on F. 


Thus f* has support in E and f- has support in F. Furthermore 
X1 = 1 except for the boundary of E; hence f+ has finite support and 
aft < œ. Moreover 
af~ = a,Z1 = > (apZ); < 00 
jeE 
since Z; = 0 except when j is on the boundary of E. Thus a|f| < œ. 
Hence g is a potential. Finally af > 0 by Corollary 8-26. 


5. Energy 


Classically, energy is the integral of the potential with respect to the 
charge, and we shall adopt the obvious analog of this definition. 
Throughout this section we shall write 

u = dual f 
v = dualg. 
Definition 8-52: If g = Nf is a potential and |u| N |f| < œ, then 


its energy is defined to be I(g) = yg = vf, and g is said to have finite 
energy. 


If all potentials are bounded, then all of them have finite energy. 
For if g = N|f|, then 
lel N fl = Dd alfil 


i 

< sup Ẹla] f|) 

< œ. 
In any case a potential of finite support has finite energy. 

We can write energy either purely in terms of the charge or purely in 
terms of the potential: 
I(g) = w(Nf) = A — Pyg]. 

Since the dual of a number is the same number, we also have 


Ig) = (uÑ)f = (I — Pyg. 
If f is a charge for P, then, as noted after Theorem 8-3, f is also a 
charge for P. In the two processes we have 


I(Nf) = w(Nf) = (uN) f 
I(Nf) = (uN) f = w(Nf), 


and the energies are equal since the matrices associate by Corollary 1-5. 


and 
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The expression for I(g) in Definition 8-52 disguises the fact that energy 
depends only on the values of the potential on the support E. We 
shall derive a simple dependence of I(g) on P*. 


Proposition 8-53: If g is a potential of finite energy with support in EF, 
then 


I(g) = vel(I — Pyg] = [vell — P*)] gz. 
Proor: Since f has support in Æ, 


(I — P* gz = fe 
by Proposition 8-19, and 


vel(Z — P*)ge] = vefe = vf = 1(g). 
The other half of the proposition is the dual of the first half. 


Classically energy is non-negative. We shall prove shortly that the 
energy of a potential is non-negative provided it is finite. To do so, 
we first introduce a definition. If g = Nf and g = Nf are potentials 
of finite energy, we define 


(9,9) = 3(ug + ñg), 


provided the matrix products are well defined. (We shall show soon 
that this condition is always satisfied.) 

Note that (g,g) = I(g). We wish to show that (g, 9) is an inner 
product. The reader should verify that (g, 7) satisfies (1), (2), and (4) 
in general and (3) when all the potentials have finite support. We 
shali prove (5) and the general case of (3) below. 


(2) For every real number c, (cg, 9) = c(g, 9). 
(3) (g + g', g) = (9; g) + (g', 9). 


(4) If g is a pure potential for which (g, g) = 0, then g = 0. 
(5) (g, g) = 0 for all g. 
Lemma 8-54: If g has support in a finite set Æ, then 
Ng) = 4D [lem + mdg? + D Plo — g?) = 0, 
ieE jeE 
where 
m=1- > P20 and m = 0 — > aP > 0. 


jeE keE 
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Proor: We shall apply Proposition 8-53. The matrices involved are 
finite matrices so that distributivity and associativity hold. Hence 


I(g) = vege — veP "ge 
=} > Ká + ag? + > (= 20.5 a03) 
jeE 


ieE 
=} > [e( = > P59? + (« > aieP Es) 
ieE jEeE keE 


+ > (Pig? — 2uPigg; + «Ph g)| 
jEE 


2 > |m + mg? + > aP — o]; 


ieE JeE 
Since P” is a transition matrix and «p is P¥-superregular, m and m are 
non-negative. Hence I(g) = 0. 


From properties (1), (2), and (3), we can prove that Schwarz’s 
inequality holds for g and g whenever they have finite support. 


Lemma 8-55: If g = Nf and g = Nf are two potentials of finite 
support, then 
(9,9)? < I(g)I@). 


ProoF: By Lemma 8-54 we have 
I(xg — 9) = (x9 — ĝ, xg — 9) = 0 
for all real x. Hence by properties (1), (2), and (3), we find that 
z?(g, g) — 2u(g, 9) + (9,9) = 0 


for all real x. If (g,g) = 0, then, for —2z(g, 7) + (J, g) to be non- 
negative for all x, it must be true that (g, g) = 0, and the lemma is 
trivial. Otherwise, the discriminant of the quadratic equation in x 
must be non-positive, so that 


4(g, 9)” — 4g, 9)(9, g) < 0 
or 


IA 


(9, 9)(9; 9) 
I(g)1(9). 


Lemma 8-56: Let g = Nf and g = Nf be pure potentials of finite 
energy, let 


(g, 9)? 


E, C E, C Ez.. 
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be an increasing sequence of finite sets with union the set of all states S, 


and let 
fe <, fe 
(n) — n n) — n 
É « ) a p ) 


g”? = Nf®, and g™ = Nf”. 
Then (g, 9) = limpo (g™, 9). 


Proor: We have (g, 9) = 4(ug + fg), and by symmetry it is enough 
to show that ~™g™ converges to jig. By monotone convergence, we 


have lim g™ = g, since 0 < fV < fP <---. Let 
m pa 
1 = ™ if te H, 
0 otherwise. 
Then 


pr = ph 
and 
lim A™ = lim g™ =g. 


The functions h™ are non-negative and increasing; also ñ is non- 
negative. Thus by monotone convergence, 


lim prg™ = lim pho = p lim AM = pg. 
Lemma 8-57: If g and g are pure potentials of finite energy, then 
(9,9)? < (9, 919, 9). 
Consequently (g, 9) < ©. 


Proor: Form the approximations to g and g as in the statement of 
Lemma 8-56. By Lemmas 8-54 and 8-55, 


(g™, gP) < (g, g)(g™, 9). 
Applying Lemma 8-56 to each factor, we obtain 
(9. 9)? < (9, 9)(9, 9)- 
If g and @ are any potentials of finite energy, then 
lg. DI < (NFL NIFD < VINIF DINIFI) < o. 


Therefore (g, ĝ) is always well defined. We can now prove (3) in 
general by breaking charges into positive and negative parts. 


Proposition 8-58: If g has finite energy, then I(g) > 0. 
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Proor: Write g = Nf = Nf* — Nf-, where Nft and Nf- are 
pure potentials. Then 
Kg) = (9,9) = (Nf* — Nf-, Nf* — Nf-) 
= I(Nf*) — 2Nft, Nf-) + (Nf) 
= UNft) — 2VI(Nf*)(Nf-) + (Nf-) by Lemma 8-57 
= (VI(Nf*) - VINI)? 
= 0. 

We can finally prove Schwarz’s inequality for all potentials of finite 
energy by proceeding just as in the proof of Lemma 8-55. 

We now begin the proof of the fundamental result about energy, the 
theorem that justifies the name “equilibrium potential.” Unfor- 
tunately the result fails for the most general transient chain, so that 
some extra hypothesis is needed. We shall prove the theorem— 
Theorem 8-61—under the hypothesis P = Ê, a condition that is 


satisfied in the classical case of the three-dimensional symmetric random 
walk with « = 17. 


Lemma 8-59: The energy of the equilibrium potential on Æ is the 
capacity C(Z). 


Proor: I(h#) = nëhE = nBhE = nfl = "1 = C(Z). 


Lemma 8-60: If # is an equilibrium set, if g = Nf is a potential with 
support in Æ, and if P = P, then (g, h?) = af. 
PROOF: 
2(g,h®) = ph? + qëg 
= ph + ENS 
= ph® + pÑeE by duality 
= phE + pNeE since N = Ñ 


= 2ynh* since hE = Ne® 
= 21 since f has support in E 
= 2af. 


Theorem 8-61: Suppose that P is a chain in which P = Ê. If E is 
an equilibrium set, then the equilibrium potential for Æ minimizes 
energy among all potentials of finite energy whose support is in E and 
whose total charge is C(Z). 
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Proor: Let g = Nf be a potential with af = C(#) and with support 
in E. By Lemma 8-60, (g, hē) = C(E), and by Lemma 8-59, I(h*) = 
C(E). Furthermore, I(h#) #0 by property (4). Therefore, by 
Schwarz’s inequality, we have 

(g, hF}? _ C(B)? 


We shall see in the next section that the theorem need not be true if 
P +#P. 
6. The basic example 


In this section we shall work out what the results of the preceding 
five sections mean in terms of the basic example. 
First we compute P, the 8-dual of P. 


Bi-17,/B; = 1 ifj=i-1 


A T ÉH = 8, By. ifi 


0 


0, 


Thus the reverse process proceeds deterministically a step at a time to 
the left until it reaches 0. From 0 it may step into any state and does 
so with probability 
Po; = By — Pirr 

Since 5; Py); = 1 — Bo < 1 and since 0 is reached from all states with 
probability one, the extended chain for P is absorbing. We saw in 
Section 5-10 that P has no non-zero regular measure; on the other hand, 
B is regular for Ê since 


> Bib; = Bo(B; — By+1) + Bil = By. 


From Section 5-10 we know that 


B; if i<j 
ree ae 
4—2 ifi> 
Bo Bı f 
Hence 
$ ifizj 
K _BNa_]j"’ 
DES 
Bii if i < j. 
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We note that N has columns tending to 0, but Ñ has columns bounded 
away from 0. The latter fact by itself implies that the extended chain 
for P is absorbing. 

Next we find the general form of potentials. In Pifg = Nf, then 


co iai 
(1) = N= S Bh- Y Bf, 
Bo j=0 hi 
In Ê if g = Nf, then 
1 foo} foe) 
(2) i= Ba > Bf — > Íi 
œ 7=0 j=i+1 


In either case, g is finite-valued if 8| f| < 00, in agreement with Lemma 
8-2. (For the reverse chain we have 


o> > Blf 2 Bo > fil, 


and hence 5; |f| < œ.) 
Let u = dual f and v = dualg. Then as required by duality 


i= a > Bif; 5 > Bif; 


z & 2. by Z > bj = (uÑ): 


Thus p is a left charge with potential measure v for Ê. 
Theorem 8-4 demands that f = (I — P)g when g is a potential with 
charge f. We have from (1) 


(Pg): = Pi+19i+ı + li+190 


_ Pi+1 Ome +1 
= Fan -BEE X Bits + GE BN 


l 
= (Bf) — 3, a 
Hence g, — (Pg); = f and (I — Py = f. 
For both P and Ê, we have 
Nu = Ny = b, 


and thus the condition 8, > kN of Proposition 8-10 is satisfied with 
k = ße. Hence all potentials in both P and Ê are bounded. We can 
see directly the boundedness from (1) and (2). In (1) and (2) we have 


lg] < ae 
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The estimate 


| 9:| = PE 


of Proposition 8-10 is better; it takes into account the cancellation of 
the first and second terms in each expression. Iff > 0 in (2), then the 
gi increase monotonically to f|f|/B., so that the proposition gives 
the best possible bound. In (1) we have lim, g; = 0, in agreement with 
the second half of Proposition 8-10 since the columns of N tend to zero. 
However, Ñ does not have columns tending to zero, and lim, g; in (2) 
is not necessarily 0. 

We determine the regular functions and signed measures for P and 
Ê as follows. If r is a P-regular function, then 


ri = (Pr), = Dis iiss + G+iTo- 
Thus 
To = Pili + ifo Pifo = Pii» 
and 
To = 1}. 


Hence only the constant functions are P-regular. Dually only multiples 
of B are regular signed measures for Ê. Since P has no regular signed 
measures, Ê has no regular functions. (Recall that P1 4 1.) There- 
fore, the non-negative superregular functions of P are pure potentials 
plus non-negative constants, whereas only pure potentials are non- 
negative superregular functions for P. 

Next, we determine the equilibrium sets for P and P. It is clear 
that P will be in any infinite set infinitely often a.e. Hence only finite 
sets are equilibrium sets. Let us verify this fact in terms of equilibrium 


potentials. 
Let E be a finite set with m as last element. Then 
Base pets 
— ifi=m 
e£ = Bm 
0 otherwise 
and 
Be? = Bren = Bo- 
By (1), 
MP = > (Be) -2 5 B 
E = — (Be) — = e 
B am TT 
or 
1 ifi <m 
hE = 
' = Ben ifi>m 
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It is clear probabilistically that these are the correct values for hF; 
when i > m the only way to avoid hitting E from 7 is to march straight 
out tothe right. The probability of doing so is B,,/B,. Thus we see that 
hë satisfies the conditions of an equilibrium potential: All non-empty 
finite sets have capacity fo. 

If E is an infinite set, then hë = 1 and eë = (I — P)h¥ = 0. Hence 
hF is not a potential, and Æ is not an equilibrium set. The fact that the 
supremum of the capacities of finite subsets of an infinite set Ẹ is fo 
does not contradict Proposition 8-39, since Ñ does not have columns 
tending to zero. 

For the reverse chain Ê let E be any set (finite or infinite) with least 


element m. Then AF = 1 fori > m. Fori <m, 


E _ RE _ — Nom _ (Enbe) = 1 _ Boo 
MoM = Bona e TR 


The next to last equality follows from the fact that m = 0 is incom- 
patible with ¿ < m. The process can escape from £ only via m, and 
for m > 0 


= (I — PE = E - hE, = ©. 
If m = 0, 


Hence, in either case, 
PeF = Brem = Po- 

Thus all sets are equilibrium sets, and every set has capacity Bo. We 
note that only the finite sets are equilibrium sets for both P and P, and 
their capacity œ is the same in both, as predicted by Proposition 8-35. 

The dual of Proposition 8-23 is that every non-negative superregular 
measure is the increasing limit of pure potential measures of finite 
support. We shall produce the charges for P which give rise to the 
potential measures which increase to 8. Let H,, be the set {0,..., m}. 
The functions h¥» = Ne®m are the functions which Proposition 8-23 
gives as increasing to 1. Therefore, by duality the measures nën Ñ 
should increase to £, and the charges we seek in Ê are the nm. From 
our above calculations we have 


Bo ifi =m 
=f 


0 otherwise, 


(t=) ý ifi<m 
N mAh = 


i— bo ifi>m. 


and 
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Next suppose as in Lemma 8-24 that Nf < Nf with f > 0 and $f 
finite. Since by (1) 


(NÎh = x (BF) 
and 


(Nf)o = z- (Bf), 


we must have Bf < Bf. The proof for the reverse chain is not so simple, 
however. If © n < Ñf with f > 0 and ie finite, then 


(Bf) — > iis = (Bf) - > & 


zx j=i+1 Bo j=i+1 


for alli. According to the proof of Lemma 8-24, we multiply through 
by n?n and take the limit on m. We have 


Bi-Ba > Fis Pf—-Bo > fh 


j=m+1 j=m+1 
As m —> œ, we obtain Bf < ff. 

Turning to the potential principles, we shall first illustrate the 
Principle of Domination (Theorem 8-45). We do so in PÊ. Let f= : 
and f > 0 be given, let Æ be the support of f, and suppose that g; > 
for ie E. For convenience, ouee ee Oe. From (2) we have 


eS hee e = = 2 for ie E. 


Let k be in Ē, and let i be the largest state of E for which: < k (i exists 
since 0 €e #). Then 


O ee a E ie 
j Bo j>i Bo j>i 
-E -3 h-t 


the next to last equality holding since f, = 0 for i < j < k. Hence 
g = @. 

Next we examine the Principle of Balayage (Theorem 8-46) for Ê. 
Let f = 0 be given, and for convenience let Æ be an infinite set con- 
taining 0. We wish to choose f with support in E so that J; = gg. 
For i c E, we must have 


E-Si-fk- 3h 


j>t 
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Let k be the next element of E greater than i. We must also have 
Bf z _ Bf 
B. 2 h= 8, 2 Íi 


Subtracting, we find that fe = Z}-1+1 f This equation determines 
all of f except for fy. Adding the relations for fẹ with k > 0, we obtain 


Di>o Í T Ddi>o Si. 


Since for gp = Jọ we need 


we must choose Bf = Bf. Thus set 
fo =fo + 2 Bf — P) 


k 
fe= > fı fori,keEandjgE wheni <j < k. 


gst 
To see that we have actually chosen fọ > 0, we note that 8; decreases 
with j, so that if i, ke E with no je E fori < j < k, then 
Biralfiri — Jira) + Bisolfiva — Jiro) +00 + Bile — Ji) 

= Bisifisı + Bisofisa +t + Bife — Brde 

= Pisafisi +t + Bife — Pilisi +e Se) 

= (Bizi — Br)firi +e (Be-1 — Pi)fk-1 
> 0. 


We shall illustrate the Principle of Condensers (Theorem 8-51) for 
P in the case E = {0,1,...,a}and F = {b,b + 1,...}with0 < a < b. 
We have 


1 fori <a 
= FH or eg * 
gi = in = 3, ora<i<b 
0 fori > b 
Then 
Bo PA 
— ifi =a 
Ba 


Si = Gi — Pisi1Gi+1 — U4190 = -qa ifizb 


0 otherwise. 
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Hence 


Bf = By — 2, Bidi+1 = Bo. 


We can verify from (1) that g is the potential of f, and we see that f+ 
has support in E, f~ has support in F, and that Bf > 0. 

We can show by example that the Principle of Condensers does not 
hold for all equilibrium sets Æ. In Ê let Æ be the set of even states 
and let F be the set of odd states. All sets are equilibrium sets for P, 
but E does not have a finite boundary. If 


1 foric# 
g = 
i 0 forie F, 


the theorem requires that g be a potential. But ifie F, 
f= 0- (Pg), = -1. 
Hence 
(Nf-)o = - >, Soi = E > (54 - 1). 
j odd j odd 0) 


The expression on the right side may be infinite if the f’s are chosen 


properly. Let 
1 
Ries 


Bo = b. 


The f’s determine the transition probabilities uniquely, and for this 
choice (Nf ~), is infinite, a contradiction. 

Equation (1) gives us the following relation for energy in the 
P-process. 


1(g) = 5 (af)? — > f. 3 B;f 


It is not difficult to see that (2) yields the same value for the same f. 
But it is not easy to see that I(g) > 0 if g is not a pure potential. We 
shall now show that Theorem 8-61 fails if the assumption P = P is 
dropped. In Plet E = {0,1,...,m}. We have seen that C(H) = Bq. 
Thus any potential with total charge Bf = 8, equal to that of the 
equilibrium potential has energy 


Bo- Š 4D Bis 
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The equilibrium potential has energy fo, which is a maximum (not a 
minimum) among pure potentials. For example, let 


48o ifi=0 
1 RRIA 

fi = 3p, Pe if 1=1 
0 otherwise. 


Then £f = fa and 
I(9) = Bo — folBofs) = Ba — zy Bo? < Bo: 


7. An unbounded potential 


In Proposition 8-10 we saw that a sufficient condition for all potentials 
to be bounded is that « be chosen so that œ; > kN, for all i. The 
purpose of this section is to show that unbounded potentials may exist 
when this hypothesis is not satisfied; the potential we exhibit will have a 
bounded charge whose support is at the same time an equilibrium set 
and a dual equilibrium set. 

The chain P will be a modification of sums of independent random 
variables on the line with p, = 4 and p_, = 3. Let 


P = = 
ii i} for i < 0 


Piisi = 
and 
1 
Py=l—-= 
P == i > 0. 
tina = for i > 
1 
Piiri = 35 


If the process is watched only when it changes states, it becomes the 
Pı = 4, P-ı = $ process, so that we may compute H from the latter 


chain. 
1 ifi>j 
H; = sions. MS $ 
Gy if i<j. 
Therefore, 
Ay = Piisi l + Pu-i + Py-l = 1 - Pii- 
and 
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Hence 
3 ifj<0, izj 
y (8G ifj<o i<j 
4 )3y ifj>0, izj 


3j(4))-' ifj>0, i<j. 
Let E = {1, 2, 3,...}. Since the process goes toward —oo with 
probability one, it can be in Æ only finitely often a.e. Moreover, 
ef = 0 unless ¿i = 1,80 that «eë < œ for any choice of «. Thus Æ is an 


equilibrium set. 
We shall take « to be the zeroth row of N. Then 


m 3 if 7 < 0 
Ce; = = 
7%" Ngy if j > 0. 


If we calculate P, we find that 
Pi, = P; for i > 0, 


Ê = 
ay i for i < 0, 
Pia = 3 


and i 
Poi = Pya =4. 


With probability one the Ê process reaches 0 from all states, and from 
there it can disappear. Hence the extended chain for Ê is absorbing, 
and Ê is in any set only finitely often a.e. As before, «êF < oo, and 
E is therefore an equilibrium set for Ê. 

Thus Æ is both an equilibrium set and a dual equilibrium set. We 
shall choose a bounded charge with support in E. Let 


l1 ificE 
ne Z 
0 otherwise. 


Then 


-= 5 aay - a. 
a= 2 JA = G— pp = & 


Thus f is a charge. Its potential is 
>, 34)! if i <0 


j>o 


gı = > Ny =4 i 
j>0 > 3j + Dd 34 if i> 0. 
j=1 j>i 
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Summing these expressions, we find that 


6(4)!#! for i < 0 
g E 
t (802+ 3i +4) for i> 0. 


Thus lim; o 9; = +0, and g is unbounded. 

We note that Pg for large n is a weighted sum of g values along paths 
that the process is likely to take. Thus the fact that lim; +o 9; = +00 
does not contradict P"g —> 0 since the process moves toward +00 with 
probability zero. On the other hand, in the direction that the process 
does go, namely —oo, we do have lim;„ -o 9; = 0. 


8. Applications of potential-theoretic methods 


Many useful quantities for transient chains arise as means of non- 
negative random variables: h; = Mz]. To compute h we can often use 
a systems theorem argument (Theorem 4-11 with the random time 
identically one) to obtain an expression of the form h = Ph + f. 
Under appropriate circumstances we may write (I — P)h = f, and if h 
can be shown to be a potential, we conclude h = Nf. The purpose of 
this section is to give some sufficient conditions under which all these 
steps are valid and to apply the results. 

We first restrict our attention to the case of a bounded non-negative 
random variable z. Later in this section we extend our results to 
obtain Theorem 8-67, which is a powerful tool applicable even if the 
vector h is not necessarily finite-valued. 

To maximize the number of potentials, we choose a row of N as a. 
Then all finite-valued functions of the form Nf are potentials, and a set 
E is an equilibrium set if and only if the process is in Æ only finitely 
often a.e. Let z be a bounded non-negative random variable, and let 
z™(w) = Z(w,). We shall assume that z® < z. Then Mz] is finite 
for all i, and Mz] < Mjz]. Define 


h, = Miz] 
and 
fi = Miz — z™] = 0. 


Lemma 8-62: The column vector A is superregular and satisfies 
(I — P)h = f. Furthermore, z% = lim,.,,, z™ exists and is finite. 


Proor: By Theorem 4-11 with the random time taken to be 
identically one, we have 


Miz] = > Pi,M,[z] = (Ph). 
k 
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Therefore, 
[UZ — Ph}, = Miz] — Mz] = Miz - 2] 
= fi. 
Since f > 0, h is superregular; and since z® < z, we have 


Zz > ZP > 72 >... > 0. 


Therefore z‘~ exists and is finite. 


Lemma 8-63: The function h satisfies k = Nf if and only if z® = 0 
almost everywhere. 


Proor: By dominated convergence, 
Mz] = lim M[z™}, 
and by Theorem 4-11 with the random time n, 
{M[z”}} = P"{M[z]} = Pth. 


Hence 
Mz] = lim Ph. 


By Theorem 5-10, 
h, = (Nf); + Miz]. 


Thus h = Nf if and only if Mz] = 0 for every i. Since z% > 0, 
Mz] = 0 for all į if and only if z® = 0 ae. 


Lemma 8-64: If either of these conditions is satisfied, then h = Nf: 


(1) There exists an equilibrium set Æ such that z(w) = 0 for every 
path w which does not go through Æ. 

(2) The enlarged chain is absorbing and z(w) = 0 for every path w 
which begins in the absorbing state. 


Proor: The second condition is just the first for Æ, the set of all 
transient states. For the first condition, on every path which does not 
pass through E, z(w) = 0, so that z(w) = 0 for every n > 1 on such 
paths. On almost all paths which do pass through Æ, there is an n 
which is a function of the path and which denotes the last time the 
process passes through # on that path. Therefore, on almost all paths 
there is an n depending on the path such that z2™(w) = 0. Hence 
z™®™ = 0 a.e., and the result follows by Lemma 8-63. 
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We shall now generalize our considerations. Let z(w) be a non- 
negative random variable with h, = Mz] not necessarily finite. Sup- 
pose z > z, and define "z(w) = min (m,z(w)). We agree to set 
z — 2 = 0 at all points where z” is infinite. Then 


Z(w) = lim ™z(w). 


m> æ 
The crucial property of the functions "z is that 
(™2) 2 m(z®), 


We denote the common value of ("z)® and "(z®) by ™z, and we 
define "2™ analogously. 


Lemma 8-65: If z > z™ > 0, then z — 2 > "z — ™%) > 0. 


PROOF: 
z—z® ifz<m 


mz — mz = 40 if z® >m 


m — z® if z>m > 2, 


Define vectors "h and "f by 


"h, = M{"z] 
and 
mf, = Mz A mg], 


Lemma 8-66: "*1f > "f and lim,, "f = f. 


Proor: We note that ™("™*+1z) = "z. Applying Lemma 8-65 to 
m+iz, we find that ™*!z — ™t1z@) > mz — ™, Then ™t1f, > "fi 
Since lim ("z — "®) = z — z® a.e., lim ”f = f by the Monotone 
Convergence Theorem. 

Theorem 8-67: Let z be a non-negative random variable. Define 

Zw) = z(w) 
z(w) — 22(w) = 0 when z2™(w) = +00 
h, = M{z] 
fı = Milz — 2, 


and suppose that z > z). If z satisfies either one of the following 
conditions, then h = Nf: 


(1) There exists an equilibrium set E such that z(w) = 0 for every 
path w which does not go through Æ. 
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(2) The extended chain is absorbing and z(w) = 0 for every path w 
which begins in the absorbing state. 


Proor: For each m, "z is bounded. If one of the conditions applies 
to z, then it applies to "z because 0 < "z < z. Hence by Lemma 
8-64, 

mh = Nf. 
By Lemma 8-66, “f increases to f, and by monotone convergence "h 
increases to h. Therefore h = Nf by the Monotone Convergence 
Theorem. 


We now apply Theorem 8-67 in four special cases. 

First let P be an absorbing chain with fandamental matrix N. We 
define af}? = M[a(w)"], where a(w) is the absorption time defined in 
Chapter 5. The column vector a is indexed by the transient states 
of P. 


Proposition 8-68: If P is an absorbing chain, then 
r-1 


a= > (p) Noa + M1. 


m=1 


Proor: Start the process in a transient state i, and let a be the 
absorption time. Since 7 is transient, 


a’(w) = (a + 1)"(,) 
so that 
a'(w) — a%(w,) = (a + 1)"(w) — a'w) 
or 
(a — (a7))(w) = ((a + 1) — a’)(w). 


By Theorem 4-11 with the random time identically one, 


Mia’ — (a’)] = > P,M,{(a + 1)" — a'] 


= > RMD + > TAPA (*,)a"] 


kabs. k trans. 
r-1 
r 
= > (z) > Quah +1 
m=1 k 


since > Ry + È Qu = 1. In Theorem 8-67 let z = a. Then z 
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satisfies z > 2), and condition (2) of the theorem holds for z. As we 
have just shown, 


= (eam +1 
— a™ + 3 
m=1 m 
and we also have h = a”. Hence h = Nf by the theorem. 


Corollary 8-69: Let P be an absorbing chain. Then there exist real 
numbers c and d such that 


a” < cNa®-) < da”. 


In particular, a is finite-valued if and only if Na”~» is finite-valued. 


ProoF: By Proposition 8-68, 


r-1 
a =NY> (j,)oa + M1, 
m=1 M 


even if both sides are infinite. Hence 
a® < (27 — 2)NQa°- + N1 since a™ < a-d 
(2 — 2)(N — Ija) + N1 since N — I = NQ 
< (X — 2)Nat-)) + N1 
(27 — 1)Nact-) since 1 < a®~), 
For the other inequality, 
a” > NQat- 
= Naf-) — ga@t-) 
> Naf) —a™ since aY < a”, 


Hence Na"-) < 2a”. 


As a second example, let P be a transient chain, and for any two 

transient states 1 and j define 
W, = Mi{n,’]. 
The reader may verify with the aid of Theorem 4-11 that if z = n}, 
then 
Miz — 2] = (2N ag — Ihz 

Now {j} is an equilibrium set, and n,? = 0 for all paths not going through 
this set. Hence, for fixed j, the column vector h with h; = W 
satisfies condition (1) of Theorem 8-67. Therefore, 


W = N(2N,, — J). 


ij 
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We note that W is finite-valued; similarly one shows that M,[n,;"] < œ 
forr > 2. 

Next let E be an equilibrium set and let z be the time at which the 
process is in Æ for the last time (or 0 if Æ is never reached). Set 
vë = M{z]. Then 

i if E is ever reached after time 0 
Fe as gl = 


0 otherwise. 


Hence Miz — z™] = AF, the probability that, from i, E is reached 
after time 0. The random variable z satisfies condition (1) of the 


theorem. Hence E 
vE = NRE. 


Finally, let E be an equilibrium set and let j be any state. Let z; 
be the number of times in j before Æ is left for the last time (or 0 if Æ 
is never reached). Define 


Ni, = Mi[z;]. 


Then z, satisfies condition (1) of the theorem, and we have 


1 if xo(w) = j and Æ is ever reached 
pag 
0 otherwise. 


Hence 
Miz — 2] = 8,,nF, 
and 


9. General denumerable stochastic processes 


We shall show that any denumerable stochastic process can be 
represented within a transient Markov chain in such a manner that 
potential theory applied to the chain yields corresponding results for 
the stochastic process. 

Throughout this section we shall deal with a probability space Q 
with measure u, and a fixed sequence {%,} of partitions of 2. Each 
Z, has a denumerable number of cells, and &, C &,,,. For con- 
venience, we assume that Z, = {Q}. We recall that (fn, Ža) is a 
stochastic process if f, is constant on each cell of Z,. (This condition 
is Definition 2-5 expressed in terms of partitions.) If Ue @&,, define 
f,(U) to be this constant value. 


Definition 8-70: If {%;} is a sequence of partitions, the space-time 
Markov chain for {#,} is defined to be a Markov chain whose states are 
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all ordered pairs <U,n>, where U e Z, and pu(U) > 0, and whose 


transition probabilities are 


u(V) if m=n+1landVCU 


Pyuiny.cvam> = p(U) 
0 otherwise. 


The chain is started in the state <Q, 05, which will be called state 0. 


Proposition 8-71: The space-time chain is transient, and ifi = <U, n), 
then 


No = Ho = P= pV). 


Proor: State 7 can be entered only on the nth step, as is clear from 
Definition 8-70. Hence the chain is in ¿ at most once, and No, = 
Hy, = P&. Along the path from 0 to i there is a unique sequence of 
cells 


U C Un- C Un-2 C-C U,C Q with Up E€ Rp 
Then 


n) — 
PẸ; m Pocu, ay Pcu,.1>,.<U9.2) SRA Peu, ,n-1),<U.n 


— (U) (Ua). P u(U) 
(Q) (Ui) p(Un-1) 


For example, let Q be a sequence space with some probability 
measure p, and let &, be the partition such that Z,* = F,. As 
usual, we may think of a cell of F, as a path <i,, i2,...,7,> of length 
nin Q. A state of the space-time chain may also be thought of as such 
a path, and the chain moves from ij, %2,...,%,> to (jy, jos. -s jn) 
only if ją = ip for 1 < k < n. The probability of such a transition is 


= p(U). 


Pr[%n41 = Jn: | Hy =i Att A En = Wy]. 
The starting state 0 may be thought of as the empty sequence. 
Definition 8-72: If (fa, 2,) is a stochastic process, then the function 
f defined on the states of a space-time chain by 
FKU, n>) = faU) 


is said to correspond to the process (fn, a). 


We write f ~ (fns Za) when f corresponds to (fa, Z,a). If we identify 
two stochastic processes (fa, 2,) and (gn, Za) when fa = gn a.e. for 
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every n, then the correspondence between all stochastic processes on 
{#,} and all functions on the states of the space-time chain is one-one. 
We now restrict ourselves to the case where the f, are real-valued 
functions. Under the following definition the correspondence preserves 
inequalities, linear combinations, and limits. 


Definition 8-73: Operations on stochastic processes are defined by 


> (fr 2 (In Bn) if fa < Gn a.e. for all n. 
2) a(frs a) + b( In Ra) = (af, + bIr» Ra). 
a lim, (fx, Ba) = (fns An) if lim f,™ = fn a.e. for all n. 


Lemma 8-74: If (fn, Za) ~ f, then 
(MU fn+x | Ral: Ra) F P*f 


in the sense that if either quantity is well defined, then so is the other; 
and if they are both well-defined, then they correspond. 


Proor: We shall proceed by induction on k. If k = 0, the result is 
trivial. Suppose that both quantities exist for some k and that they 
correspond. Then 


(PEH), = > PPP 
or 


V 
(PEP) cu.ny = > on (P*f)ev.n+1> 


vVcU 
VERn +1 


By inductive hypothesis, (M[fn+| Zal, Za) ~ Pf. Hence, by def- 
inition of the correspondence, 


(P'F)ev.n+i> = MU fn+14+k | Rail) 


1 - 
= uY) pA HW) faoisn(W), 


WEBn+1+k 
and 


HV) e(W) 
2. p(U) ot pV) Jarir W) 


VERn +1 WERn+1+k 


1 


= uU) 2 H(W)fn+i+K(W) 


We@n+1+k 


Mifne1+k | &,\(U). 


(PF*"f) con 
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That is, P*+1f exists if and only if M[fn}w+1) | 2n] does, and if they 
exist, then they correspond. 


Proposition 8-75: If (h,, Z,) ~ h, then (h,, #,) is a supermartingale 
(martingale) if and only if h is superregular (regular) and P*h is finite- 
valued for all k. 


Proor: If h is superregular, then Ph < h. Hence, by Lemma 8-74, 
M[has1 | Bn] < hn a.e. for all n. If (P¥h) i is finite-valued, then 


(Pth) = > w(U)h,(U) = Mih] 


UERk 


is finite. Since h, is constant on cells of Z,, Definition 3-5 is satisfied. 

Conversely, if (hn, #,) is a supermartingale, then M[h,,, | Za] < hn, 
and hence Ph < h by Lemma 8-74. Moreover, M[h,,,] = (P**"h) 
is finite. Ifi = <U, n), then PẸ = (U) > 0. From 


(PF+™h), =e PS (Prh);, 


we see that (P*h), must be finite. The proof for martingales simply 
replaces Ph < h by Ph = h. 


Definition 8-76: (f,, 2,) is a stochastic process charge with potential 


(ns Ra) if Dw MIJ fal] < 0 and if (In: B,) = De (MU fn +x | Ral Ra). 
If, in addition, f, = 0 then the potential is called a pure potential. 


We shall make use of potential theory results for the space-time chain 
P. As the distinguished measure «œ, we select a; = Ny; > 0. As 
usual, if af is finite, then f is a charge. 


Proposition 8-77: Charge functions correspond to charge stochastic 
processes, and their potentials also correspond. 


Proor: If (fns 2a) ~ f, then by Proposition 8-71 
alf] = 2, MUMIS] | 2U 


> Su U): |fa(U 


n UERn 
= > MIJA]. 
n 
Since sums and limits are preserved, we have by Proposition 8-74 


g= Nf z > Pf S > (Mi fn +t | Rale Ra). 
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Thus g is the potential of f if and only if the two conditions of Definition 
8-76 are fulfilled by (fn, Z,) and (gn, Z,). 


We give two applications of the correspondence. The first is a 
decomposition of non-negative supermartingales. 


Proposition 8-78: If (h,, #,) is a non-negative supermartingale, then 

there is a unique representation 
(Rn, Bn) = (Tn, Ra) + (Ins Bn) 

of (h,, Ža) as the sum of a martingale and a potential. In the repre- 
sentation the martingale is a non-negative martingale, the potential is a 
pure potential, and r, satisfies r, = lim, M[hn +p | 2a]. Moreover, 
(Jn Zn) is the difference between a martingale and a process consisting 
of an increasing sequence of random variables. 


ProoF: Existence and uniqueness of the representation follows 
immediately from Theorem 5-10 and Propositions 8-75 and 8-77. Then 
ra = lim, M[ha.,| Za] by Lemma 8-74. For the last part, let 
(Jn An) be the charge of (gn, 2a). Set 

Sn = fo +e oe eee 
and 
s = lim s, 


Then s, increases monotonically to s, and 
M[s] < > MIF, | An] < © 
n 


by monotone convergence. Since f, > 0, 
In = > M[fn+k | B,) = “> fark | Pa| 
k k 


= M[s — s, | 2,] = M[s | Br] — Sn 


Hence {g,} is the difference between the martingale {M[s | 2,]} and the 
increasing sequence s,. 


As the second application, we give a proof of the Upcrossing Lemma, 
Proposition 3-11, as it applies to non-negative supermartingales. The 
present estimate is better than the one in Chapter 3. 


Proposition 8-79: Let r and s be real numbers with 0 < r < s. Let 
B(w) be the number of upcrossings on w of [r, s] by the non-negative 
supermartingale (f,, Z,) up to time n. Then 


r 
MIB] < 
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Proor: Let f ~ (fy, 2,); f is non-negative superregular. Let E and 
F be the sets of states in the space-time chain defined by 


E — m>|m<n and f,(U) < 7} 
= {{U,m>|m<n and f,(U) = $} 


Hence f, < r for ie E and f; > s for je F. For any other state, 
i = <U, m> with m < n, r < fi < s. Now hë = (B1), is the prob- 
ability that the random variables fọ, ..., f, ever take on a value greater 
than or equal to s. Similarly, (B? BF1), is the probability of at least 
one upcrossing, and in general [(B¥B*)*1], is the probability of at 
least k upcrossings by fo, f,,...,f,. Hence 


Mie] = > Prip > k] = > [(B®BFY1}p. 
KEL kzi 


Since the chain cannot be in F after time n, F is an equilibrium set, 
and AF = B1 is a potential. For ie F 


1 
(Bf), =1< zf 


By the Principle of Domination (Theorem 8-45), 


(BM) < =f 
everywhere. Hence BEB®1 < (1/s)B¥f. Forie £, 
(Bf), < r. 
Since f is superregular and since r1 is superregular, 
Bf ri 


everywhere by conclusion (2) of Proposition 8-16. Thus 
BEB < “1, 
s 
By induction, 
rk 
(BEBF)HM < () 1, 


and hence 


Wea 


10. 


. If P is a transient chain whose states communicate and if a > 0 is 


1E 
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Problems 


superregular, show that 

(a) Ê stops (disappears) a.e. if and only if « is a potential measure. 

(b) The mean stopping time is finite if and only if « is a charge. [Hint: 
Adjoin an absorbing state to P.] 


. Let P be a recurrent chain, and let # be a finite set. Show that for any 


specified values of hę there is a unique bounded function h with the 
specified values which is regular on £. Show that h takes on its 
maximum on Æ. 


. Illustrate the result of the previous problem for the symmetric random 


walk on the integers with the specified values hy = 0 and h, = 1. 


. Let P have only transient states, let 7 be a specified probability vector, 


and let « = nN. Let f > 0 bea charge, and define the random variable 


s = f (tn). 


Show that s is finite a.e. and that M,[s] = af. What is the value of 
Mis]? Give a game-interpretation. 


. In the framework of Problem 4, introduce a second charge f, its potential 


g, and 5. Prove that if u = dual f, then 
(9,9) = 4M, [ss] + dxf. 
Find the corresponding expression for I(g). 


. If P is a chain with only transient states, let Var,; be the variance of n; 


for the process started at i. Show that . 
Vary = Ni(2N,;; — Ni; — 1). 


In the framework of Problem 6, if P1 = 1, prove that for each j there is 
an i such that Var, > N; 


Problems 8 to 19 refer to the following Markov chain: The states are the 
non-negative integers. From state i either the process moves one step to the 
right with probability p, > 0, or it remains at t. 


8. 
9. 


10. 
11. 


12. 


Find H and N. 


Give a simple characterization of 

(a) the regular functions, 

(b) the non-negative superregular functions, 

(c) the pure potentials, where a, = Noj, 

(d) the potentials, where a; = Nj. 

What does Theorem 5-10 say about this chain ? 


If g is a potential with charge f, give a simple characterization in terms 
of g of the support of f. Of the support of f*. 


Use Problems 9 and 11 to verify that Theorem 8-45 holds for this chain. 
[Hint: Distinguish the cases where the support of f + is finite and where 
it is infinite.] 
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13. Use Problems 9 and 11 to construct a counterexample to Theorem 8-45 
if the assumption h > 0 is omitted. 
14. If a; = Noj, form Ê and compute Ñ. 


15. For E = {0, 1, 2}, find AF. [See the end of Section 8.] Show that 
vē = Nh? has the desired interpretation. [Remember that the chain 
starts at time 0, not at time 1.] 

16. Show that C(#) = 1 for all equilibrium sets. Show that both the 
hypothesis and the conclusion of Proposition 8-39 are false for P. 

17. Show that Ĝ(E) = 1 for all sets Æ. Show that both the hypothesis and 
the conclusion of Proposition 8-39 hold for P. 

18. Let E = {0,1,...,}. Find BF. If g is a potential, what is the form 
of its balayage potential on H? Show that Theorem 8-46 is satisfied. 


19. Show that if p = dual f, then 
1 Aal 
I(9)=53 (> m) +5 Dm 
2\4 24 


Problems 20 to 30 refer to sums of independent random variables on the 
integers with p_, = 4. and p, = 4. Use the results of Problems 12 to 19 in 
Chapter 5. 

20. Show that there are two essentially different positive regular measures 
and that all regular measures are linear combinations of the two basic 
measures. 

21. Show that if f > 0 and «f < +œ for either a, then g = Af is finite- 
valued. Show also that lim P"g = 0. 

22. Let Æ = {0,1,..., n}. Compute B®. Choose a non-negative super- 
regular function A (not a constant), and verify the various parts of 
Proposition 8-16. 

23. For E as above, compute eë, and verify that Ne? = h”. 

24. For E as above, compute C(#) = «e? for each of the two basic measures. 
What happens as n increases ? 

25. Form P and compute Ñ for each of the two basic measures. In each case, 
does N have columns tending to 0? 

26. Use the results of the last two problems to show that each assignment of 
capacities is consistent with Proposition 8-39, even though lim, C(E) 
is finite in one case and not in the other. 

27. Show that there are infinite equilibrium sets for this chain. 

28. Choose « = 17. Prove that if lim,- +œ An = 0 and > |h; — hi_,| < œ, 
then A is a potential. 

29. Choose a function h satisfying the conditions of Problem 28, compute its 
charge f, and check that k = Nf. 

30. Let E = {0,1, 2}. Compute P? and the fundamental matrix of this 
finite chain. Verify that the latter is Np. 


CHAPTER 9 


RECURRENT POTENTIAL THEORY 


1. Potentials 


Throughout this chapter P is a recurrent chain which is either null or 
noncyclic ergodic. For such a chain, lim, P” always exists; we let 
L = lim P”. 

In a recurrent chain the non-negative finite-valued superregular 
measures are uniquely determined up to multiplication by a constant, 
and the non-zero ones are positive and regular. We choose one such 
non-zero regular measure and call it «. 

If P is noncyclic ergodic, then L; = a,/>),0,; whereas if P is null, 
then L;; = 0. In either case, Liy — Ljj;a,/a; = 0. 

Duality for P is defined with respect to the regular measure œ. The 
dual Ê of a null chain is null, and the dual of a noncyelic ergodic chain 
is noncyclic ergodic. In general, if two results are duals, we shall prove 
only one of the pair. As usual, the key to the proof by duality of the 
second result is that P is the most general chain of the type we consider 
in this chapter. 

As Definition 9-1 suggests, we define charges and potentials in the 
same way as in transient potential theory. 


Definition 9-1: If u is a signed measure with u1 finite and if 


v = lim[uw( + P +---+ P") 
n 


exists and is finite-valued, then pu is called a left charge with potential 
measure v and total charge u1. If fis a function with af finite and if 
g = lim, [7 + P +---+ P*"~')f] exists and is finite-valued, then f 
is called a right charge with potential function g and total charge af. 
The support of a charge is the set on which the charge is not zero; the 
support of a potential is the support of its charge. 
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The definition of the support of a potential is not justified until we 
prove a uniqueness theorem for the charge of a potential, but such a 
result will follow directly from conclusion (2) of Theorem 9-15. 

We note that the dual of a left (right) charge for P is a right (left) 
charge for P and that their total charges are the same (since the dual of a 
number is the same number). 

Although we adopt the same definitions as with transient chains, the 
results are sometimes significantly different. For example, the only 
pure potential is the zero potential: If f > 0, then by monotone 
convergence lim, ((J + P +---+ P*~1)f], = >; Nif;, where Ni = 
+o for every i andj. Thus the limit is finite-valued only if f = 0. 

On the other hand, every row (or column) of J — P is a charge, and 
the potential of the ith row (column) of J — P is the ith row (column) 
of I — L. For if p is the ith row of I — P, then |u| < 2 < œ and 


v; = lim [p(I + P +--+ P=) = lim (I — Pr), 
= (I - L);;; 


the assertion for columns is dual. 

Our first potential theory result will be that every charge has total 
charge zero. To prove this fact, we require the Doeblin Ratio Limit 
Theorem, Theorem 9-4. We recall that H{ is the probability starting 
in ¢ of reaching j before or at time n and that N{ is the mean number 
of times the process started at i is in j up a and including time n. 
Hence H = Sr ao FP and NP = r-o (P*);; from the latter 
relation we see that Ne (m = (a,/0) \N m. In terms of this notation the 
Doeblin Ratio Limit Theoren states that in any Markov chain with a 
positive superregular measure 

_ NY 
Pe m FE 


exists and is finite for any states i, j, i’, and j’ which communicate. 
We shall give a simple proof of this important result. 
Lemma 9-2: Let P be any Markov chain with a positive superregular 


measure a. If i and j communicate, then the quantities 


Nw — N 
and 


a, 
Nin) 2 — Nim 
ti FS ij 
i 


are non-negative and bounded. In particular, |N — N| < N; 
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Proor: If + and j communicate, then iH, < 1, so that ‘N,, = 
1/(1 — 'H,;) < œ. It is clear that 
NP- NP > 0 
and that 
Np < ‘Ny E Np 2 
and hence the first expression is non-negative and bounded by 'N;; < œ. 


By duality, NS — NP (a,/æ;) is non- negative and bounded; “if we 
multiply by œ;/æ; and sateroliarige j and i, we obtain the second result. 


Lemma 9-3: Let P be any Markov chain with a positive super- 
regular measure «œ. Ifi and j communicate, then for all n 


Nw NP oe; 
Nw $ <1 and TP < S 
and 
Nw NYP 
lim Jo Wop = H; and him Wp = = a As =N; 


Proor: By Lemma 9-2, 


0< NP -NP <c foralln. 
Hence 
Np c 
O<1- NY = S Wm NY 
Therefore NP/NW < 1 for all n. If j is recurrent, then N9P > +o, 
and the ratio NV IN A must tend to 1; we have H, = 1 since ¿ must be 
recurrent if 1 and j Sonitiunicate, Hence 
NYP 

We zg H tj 
If j is transient, then 

NP Ny _ HyNy 

NY P N; ji N jj 
The other results are duals, and the assertion about ‘N,, follows from 
Corollary 6-20. 


= Hy. 


The following is the Ratio Limit Theorem. 


Theorem 9-4: Let P be any Markov chain with a positive superregular 
measure a, and let 2, j, 1’, and j’ be any states which communicate. 
Then 
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exists. If all four states are recurrent, the limit is «,/«;. Ifthe states 
are transient, the limit is 
H y 59 %y 


Hyp Hy je) 


Remark: Since the states in question communicate, they must be 
either all recurrent or all transient. 


Proor: Write 
NP [NBI NR [NR NE], 
NAT Na] Lele) (Np 
and apply Lemma 9-3 to each factor. 
Proposition 9-5: Every charge has total charge zero. 


Proor: If p is a left charge, then > |u| < œ and 
ae lim > uN 


Zela) 
NO 


Since by Lemma 9-3, N in m) < 1, dominated convergence gives 


. Nj 3 
0 = lim 2 web) = a My, lim (385) = > Ek = 


The result for functions is dual. 


is finite. Therefore 


The condition that a function f satisfy «f = 0 is a strong necessary 
condition for it to be a charge, but it is by no means sufficient. In 
fact, it is not even sufficient in general if f also has finite support. We 
shall return to discuss this point at length in Section 2. 

We now establish as Theorem 9-7 an identity which will play a 
fundamental role when we develop an operator which transforms 
charges into potentials. 


Lemma 9-6: Let {a,} and {b,} be two sequences of real numbers 
such that a, > 0, Sa, =a< œ, |b,| < B, and |b, — 6,_,| > 0. 
Then 


n 


lim > apb — Dax) = 0. 


n+ 0 k=0 
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Proor: Let « > 0 be given. Choose N sufficiently large that 
Deon G < €/(4B) and pick N’ large enough so that for all n > N’ 


€ 


|b, 7 ba-l < AN 


Then for n > N + N’, we have 


k +1 


x 
Ii 
o 


n N n 
> (by as a) < = Ay |On ame by -xl + > a,(|b,| EJ |b, - xl) 
k=0 =N 


N kl n 
< a, >, |bn-; — bn-j-1] + 2B È a 
k=o  j=0 k=N+1 
N à  k-l e : 
< — 2 i -jz N 
< EPA san + TR 2B, Sme n -jz 
€ x € 
<> $ 
~ 2a 2m +3 
se 


Theorem 9-7: Let i, j, and k be arbitrary states in a recurrent Markov 
chain which is either null or noncyclic ergodic. Then 


lim (Nik — Nea + NP — NYP] = “Ny. 


Proor: We may assume that neither i nor j equals k, since otherwise 
both sides are clearly zero. We begin by establishing four equations: 


foe} 

(1) NR = > FONE 
v=0 
n 

(2) NP = > FRNA” 
v=0 
foe} 

(3) NG = > RNY 
v=0 
n 

(4) NP > FENG + NP. 
v=0 


Equations (1) and (3) follow from the fact that > FY = H,, = 1. 
Equation (2) comes from Theorem 4-11 with the random time t = 
min (tp, n), and equation (4) is a similar result, except that the sum has 
been broken into two parts representing what happens after and before 
state k is reached for the first time. 
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Multiply (1) by «;/0,, (2) by —a,/a,, (3) by — 1, and (4) by 1, and add. 
We obtain 


(NR — Nipaa + (NP — NYP) 


n œ 
=*NM + > FRbn-— On-v) + È FRO; 
v=0 =nt+1 


ven 

where b, = NiQa,/a, — NQ, and {b,} is a bounded sequence by Lemma 
9-2. The first term *N{ on the right side tends to “N,,, and the third 
term tends to zero since {b,} is bounded and > F$ is finite. It is thus 
sufficient to show that 


n 

lim > a(b, — b,-») = 0, 
n y=0 

where a, = FẸ. Since a, = 0, >a, = 1, and {b,} is bounded, we 

need show only that (b, — 6,_,) > 0 to apply Lemma 9-6. But 


bn — Ona = (NR = NEY) S — (VY — NE?) 


a 
— Pm (n) 
=P Oks 
k 


a 

j 

> Lin + — Lk; 
OK 

= 0. 


If in the recurrent chain P we make a set of states E absorbing, then 
EP is an absorbing chain and results about transient chains may be 
applied to it. For example, the result N, = H „N, yields EFN, = 
EH, FN,. We shall also make frequent use of the fact noted in Section 
6-2 that EÑ, = (œa) EN; 

At this point we begin developing the machinery needed for the 
main result of this section, Theorem 9-15. We first need two pre- 
liminary identities, which we establish as Propositions 9-12 and 9-13. 


Lemma 9-8: For any pair of states 1 and j, 


Se Ny 
a; ~ a; 
Proor 1: If i = j, both sides are zero. Ifi # j, then from H, = 1 
and (3) of Lemma 4-19, we find 
‘A, + (H,H,; = H; = 1, 
so that B A 
I= ‘H; = Ha 
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Hence 

‘Ny = U0 — *Hy;) = PHa 
Therefore, by Proposition 5-4 and Corollary 6-21, 

iN = = a 

iy, =R Na = Ña = A 

J 
Proor 2: Set i = j in Theorem 9-7. Then 
lim (NR — Mp) + (NP — NPjoxy] = a, “Ni 


t 
Interchange ż¿ and k, and the left side stays the same. The right side 
becomes œ; N pr- 
Lemma 9-9: For any states i, j, k with i # j, 
Nyi + Nyy = Ny. 
a; 


PROOF: 


a a 
i i iH i i 

INg + Na; IA ING + ‘Hyg Ni 
æ; 0; 


(Hig + Hp) Nu by Lemma 9-8 
IN i. 


Lemma 9-10: 


< ONY. 


a 

n i 
N) z N — 
Qo 


Proor: If i = 0, both sides are zero. Otherwise we have 
NP < Ny, + NG. 
If we multiply through by a,/a,, we obtain 
a a, 
NP < Ng + — (= Ng) = Na + = N. 
Qo \&k Qo 
Hence 


NE — FNR < Nu. 
% 
Interchanging i and 0 and multiplying by «;/c) gives 


a o 
— NR — NYP < Neo — 
Go a 
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Therefore, 


a 
(ny) _ i Nm 
NG NY 
&o 


< Ny + ‘Nao Z = °N, by Lemma 9-9. 
0 


As usual, we agree that FN, and in particular °N, is a square matrix 
indexed by the set of all states; the entries of FN on the rows or 
columns indexed by Æ are all zeros. 


Lemma 9-11: If 1 is finite, then u °N is finite-valued. Dually, if af 
is finite, then °Nf is finite-valued. 


Proor: If 11 is finite, then 


2 [vi] Ni; = > |i] Hi; Ny; < "Ny 2, [mi] < œ. 
Proposition 9-12: If „1 = 0, then 


; o 
lim > Hk [we ga a Ng] z > Pac ON yi. 
k k 


n= oo 


Dually, if «f = 0, then 
a > Ng — NR. = > Nife 
= 0 k k 
Proor: Let 
sp = [ag - wip = + ap - wy]. 
Since u1 = 0, we have 
S mel Me - Ng] = 3 wes 
k 0 k 


By Theorem 9-7, lim, S@ = "Np. Hence if we can prove that S@ 
is bounded independently of n and k, then, since u1 is finite, the result 
of the proposition follows by dominated convergence. (Note that 
Dx He oN is absolutely convergent by Lemma 9-11.) We have 

E] < F 


a; 
(n) t (n) 
NQR — — NG 
ao 


a; 

(n (n) 1 
NY — N 

QM 


The first term on the right is bounded according to Lemma 9-2, and the 
second term is bounded according to Lemma 9-10. 
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Proposition 9-13: If af = 0, then f is a charge if and only if 
lim [P”(°Nf)] 


exists and is finite-valued. If fis a charge, then its potential g satisfies 
g = “Nf — lim [P"(Nf)]. 
n 


Proor: We have 


0 if j = 0 since °N,o = 0 
(PN)y = X Pu Ney = (No = ifi = 0 andj 40 
°N,, — 5, if i #0 andj #0 
0 if j= 0 
(I — P)°N);=%-aey if i = 0 andj #0 
Òi; if i # 0 and j # 0. 


By Lemma 9-11, °Nf is finite-valued, and, hence associativity holds in 
the triple product (I — P)°Nf. We find 


> Sh =f fori #0 
j#0 
(Z — P)°Nf); = a 1 
ees | >- = — («Q = fi i= . 
2 ( 2) Í a ofo) = fo for i 0 


The next to last equality uses the fact that af = 0. We see therefore 
that (I — P) °Nf = f, so that 


lim (Z + P +--+ Prif 


_ = lim (I+ P +--+ P-Y — P) Nf] 
= lim (I — P*) Nf. 


Associativity where required in the last equation follows from the 
distributive property, which holds because °Nf is finite-valued. 


In the discussion that follows, E and F denote non-empty subsets of 
states. 


Lemma 9-14: BE FN = FN — =N, provided F C E. 


Proor: By Theorem 4-11 with the time to reach E as the random 
time, we find that (BF FN); = >, BE FN, is the mean of the total 
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number of times that the process is in state j before F, counting from 
the time when Æ is first entered. This mean is the difference of the 
number of times in j before F and the number of times in j before E, 
since F cannot be entered unless Æ is entered. 


Theorem 9-15: If f is a function with of = 0 and if 
lim [(Z + Pet Pf] 


exists and is finite for some state 0, then f is a charge. If g is its 
potential, then 


(1) g = °Nf + gol. 

(2) f= (I — Pig. 

(3) P"g — 0. 

(4) BEg = g if the support is contained in Æ. 

(5) fe = (I — P*)g, if the support is contained in Æ. 


Proor: By Proposition 9-12, if lim, >, NY} fẹ exists, then 
lim > (NYP — NY. = > Nife 
n k n 
or 
(tim > Nw e) = (tim ZNG e) = (Nf) 
n “k n K 
Hence lim, Xp NP fe exists, and f is a charge. 


(1) Therefore g; — go = (Nf); and (1) follows. 

(2) From the proof of Proposition 9-13 we have (I — P)°Nf =f, 
since «f = 0. Thus if we multiply (1) by J — P, we get (2). 

(3) By (2) we have g = Pg + f, and, since P*f is finite-valued, we 
see by induction that P*g is finite-valued and that 

Prig = Pkg + PHY. 
Adding these relations for k = 1,..., n, we obtain 
g Big sD + P+ ee PUDS. 


Hence P"g — 0. 
(4) Let 0c E. By Lemma 9-14 with F = {0}, 
BEON = ON — EN, 
and by (1) 
BEg = B® ONf + BEg,1 
oNf — FNf + o1 since B71 = 1 


=g- "Nf by (1). 


Il 
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Since f has support in E, “Nf = 0 because "N, = 0 when j e E. 
(5) f= (I — P)g = (I — P)(B%g) = (I — P) Big 


I — PE O\ (95 
= ( ) by Proposition 6-17 
0 0 gz 


ao 
. 0 


We note from conclusion (2) that a potential uniquely determines its 
charge. 


Corollary 9-16: If (J — P)g = f, then g is a potential if and only if af 
is finite and P"g — 0. 


Proor: If g is a potential, then P"g — 0 by conclusion (3) of Theorem 
9-15 and f is the charge by conclusion (2). Hence af is finite by 
definition. Conversely, if (J — P)g = f, we have by induction 


g = Pig + (Lt + Phe. 
If P"g > 0, then g = lim [(Z + P +---+ P"~1)f]; if af is finite, then 
g is a potential by definition. 


We already know that the columns of J — L are potentials whose 
charges are the corresponding columns of J — P. Corollary 9-16 
allows us to enlarge this result as follows. 


Corollary 9-17: If P is a null chain and if g is a function for which ag 
is finite, then g is a potential. 


ProoF: We shall apply Corollary 9-16. By writing g = gt — g7, 
we may assume that g > 0. Then 


1 
(P"9), = > PP 9; = = > (g) Pgp. 
j i J 
Since ag is finite and since P is bounded by one, we have by dominated 
convergence 
l 
lim (P"g), = — > (ajg,) lim ÊP = 0. 
n “5 n 
Therefore P"g — 0. Set f= (I — P)g. Then 
alf| < a(g + Pg) = og + a(Pg) = og + (aP)g = og + og < œ. 


A corresponding result for noncyclic ergodic chains will be proved in 
Section 3. 
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2. Normal chains 


The condition at the end of Section 1 that g is a potential in a null 
chain if ag is finite does not provide a sufficiently large class of potentials 
for a satisfactory potential theory. In this section we shall impose the 
condition on P that any function f of finite support with of = 0 be a 
charge and that the corresponding result for signed measures hold; such 
a chain will be called normal. The justification for considering normal 
chains will consist in our showing that the class of normal chains is 
quite extensive; we shall see in Section 3, for example, that all noncyclic 
ergodic chains are normal. 

Our procedure in introducing normal chains will be not to define 
them as above but to give a definition which is computationally simpler 
to check. The point of departure is the identity 


g = °Nf — lim [P*(°Nf)] 
n 
of Proposition 9-13. 

Proposition 9-18: If for each j there is some i such that lim (P” °N),; 
exists, then lim (P” °N) exists and has constant columns which are 
finite-valued. 

ProorF: Let j be a fixed state and let 

a,/% for k = 0 

fe =4—-1 for k=j 

0 otherwise. 

Since af = 0, we have (I — P)°Nf = f. Hence 
lim[(I + P+---+ Pf] 

=lim(Z7 + P+---+ PI — P) Nf), 

= lim [Z — P”) °Nf], 

= Nf), + lim (P* °N),;. 


This limit exists for some i by hypothesis, and hence, by Theorem 9-15, 
f is a charge and lim (P" °N),, exists for all k. Hence lim (P” °N) 
exists. By Fatou’s Theorem, 


P lim (P? °N) < lim P P*°N = lim (P*°N), 


so that each column of the limit is non-negative superregular. Hence 
the columns of the limit are constants by Proposition 6-3. 


9-23 Normal chains 253 
Definition 9-19: If the indicated limits exist, then 
iv; = lim > (P") mee Ni 
n “k 


and 
tÀ; 


fN; = him 2 (P") nie Hys. 


Notation independent of m in Definition 9-19 is justified by Prop- 
osition 9-18. We note that tv, exists if and only if ‘A; exists, that ‘v, 
is finite, and that 0 < ‘A, < 1. Furthermore, tÀ, is the probability 
that j is entered in the long run before i, and tv, is the mean number of 
times in j, in the long run, before reaching i. 


Definition 9-20: A chain is normal if for some fixed state 0 and for all 
j, CA; and %, both exist. 


We note the important fact that the dual of a normal chain is normal. 


Proposition 9-21: If P is a normal chain and if, for a given function f 
with af = 0, >), °N,,f, is finite, then its potential g exists, is bounded, 
and satisfies g = [PN — 1 %v]f. 


Proor: Since P is normal, P” °N — 1 °v. Furthermore, 


> (P'he Nig < > (Pris ° Ny = ON; 
k 


k 


and °Nf* and °Nf~ are both finite-valued. Thus we have dominated 
convergence in Proposition 9-13, f is a charge, and g = [PN — 1 %v]f. 
Since by Theorem 9-15 |g; — go] = |£ Ninfe] < De NeelSel, g is 
bounded. 


Corollary 9-22: If P is normal and if f is a function of finite support 
with af = 0, then f is a charge. 


Proor: We have >, °N,x|f,| < 0. Apply Proposition 9-21. 
The converse to Corollary 9-22 is the following. 


Lemma 9-23: If the function f defined by 
fi=<s-1 ifisj 


0 otherwise 
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is a charge with potential g, then °A, exists and gọ = °,;. Ifall functions 
f of support {0, j} with af = 0 are charges and if all signed measures p 
of support {0, j} with u1 = 0 are charges, then P is normal. 


Proor: For the first assertion we have, by Proposition 9-13, 
Jo = Nf) — lim (P" °Nf)o 
= —lim(P"°Nf), since °No, = 0 for all k 
= lim (P”°N)o; since °N,. = 0 for all k 


= 0 


Vj. 


Hence °A, exists. For the second assertion we see from the hypothesis 
about functions that °A, exists for all j. Dually we obtain from the 
hypothesis about signed measures that °A; exists for all j. Hence P is 
normal. 


Definition 9-24: Matrices C and G are defined by 


oy = im (Np — mp) 
n i 
whenever the indicated limits exist. 


According to Lemma 9-2, the quantities defining C, and G; are 
bounded and non-negative. Thus all entries of C and G which exist 
are finite and non-negative. We have further that Ci, = Gu = 0 for 
every 7 and that 

17 


Qi 


Hence G = dual C and Ĉ = dual G. 


Lemma 9-25: G; exists if and only if °v, exists. If they both exist, 
then Go; = %;. Dually Cjo = (œo/&;) °®;. 


Proor: We need only note that for the potential defined in Lemma 
9-23 
gy = lim [N8 2 — wyp] = Gy 
n 


ao 


Hence Gp, exists if and only if gy = °y; exists, and then they are equal. 
The other result follows by duality. 
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Theorem 9-26: If for some fixed state 0 and all other states j, 


(1) °A; and A, both exist, 
or (2) Go; and Cj, both exist, 
or (8) all functions and signed measures with support {0, j} and total 
charge 0 are potential charges, 


then P is normal. Conversely, if P is normal, then C, G, Àp tv; all 
exist (for both P and P), Gy = tv; = 'A,'N,, and 


a A 
Qs ih = iN 
ij = i = Ii jj» 


and all functions and signed measures of finite support and total 
charge 0 are potential charges. 


PrRooF: (1) is the definition of normality. If (2) holds, then °v, and 
°>, exist by Lemma 9-25, and (1) holds. The sufficiency of (3) was 
shown in Lemma 9-23. 

Conversely, suppose that P is normal. Then Corollary 9-22 assures 
that all f with finite support and «f = 0 are potential charges. Consider 
the charge 


fe=S—-1 ifk=j 
0 otherwise. 


By Lemmas 9-23 and 9-25, its potential has as ith component G, = 
tv, and tv, = tÀ tN; by definition. The remaining assertions follow 
by duality. 


Corollary 9-27: If P is normal, then 
ON — vj; = —[Gy — Gio(aj/a)]. 


Proor: The potential of Lemma 9-23 has Oth component gy = °v,. 
By Theorem 9-15, 
9 = Go + ONS), = %; — Mi. 
By definition, 
iat) 


g: = lim [ve] = Ng? = Gy — Gio 


Qo 


Corollary 9-28: If P is a normal chain and if f is a function with af = 0 
and >; °Nixf, finite, then f is a charge and its potential g satisfies 


g = —Gf. 
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Proor: By Proposition 9-21, 
g = [PN - 1%] 


|-¢ + = (Croa) f by Corollary 9-27 
0 
= -Gf since af = 0. 


The dual of Corollary 9-28 is that in a normal chain if p is a signed 
measure with u1 = 0 and >, uk °N,, finite, then p is a charge and its 
potential v satisfies v = — uC. 

We can use Theorem 9-26 to conclude that all symmetric sums of 
independent random variables processes are normal; in particular, the 
one-dimensional and two-dimensional symmetric random walks are 
normal. 


Corollary 9-29: If P is a null or noncyclic ergodic sums of independent 
random variables process with P = PT, then P is normal, 


Cy = Gy = Ny and 1A; = 4. 


Proor: If we put j in for k and i in for i and j in Theorem 9-7, we 
obtain 


. a 
bee [oe SD gh Ngp)| = Ny 
ne J 
Now a; = a, and NY = N% in sums of independent random variables, 
and Vi?) = NP in a symmetric process. Hence 
lim [ANP — NP) = Na 
or Cy = 4/Ny. Alternatively 


tim [2avp 2 — py] = Nu 
1 
or Gy = 47Ny = 41N;;. Hence ‘A; = 4. 


The strongest known result for concluding that a function f is in the 
domain of the potential operator G is Corollary 9-28, but that result 
involves a condition that is hard to check. The definition and theorem 
to follow give a more useful condition. 


Definition 9-30: A function f in a normal chain is said to be a weak 
charge if af = 0 and if Gf and Cf are both finite-valued. A signed 
measure p is called a weak charge if p1 = 0 and if uC and uĝ are both 
finite-valued. 
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The dual of a weak charge for P is a weak charge for Ê. 


Theorem 9-31: If P is a normal chain and if f is a weak charge, then f 
is a charge and its potential g is bounded and satisfies g = — Gf. 


Proor: Set i and j both equal to k in Corollary 9-27. Then 
Nice — Ve = [Gkr — Geoloe/eo)] = Gieo(x/0). 
Since VN, = Gok Nuk = Gio (&/%0) + Gok: Thus 


aG pa 
> ON ykfi = > [a0 F Cox) fe = (Cf)o + (Gf)o, 
k k 
and >; Ninf, must be finite. The result follows by Proposition 9-21 
and Corollary 9-28. 


Dually if P is normal and p is a weak charge, then v exists, is 
bounded by a multiple of «, and satisfies v = — puC. 

For the symmetric random walks in one and two dimensions, we 
have C = G = Ô = G by Corollary 9-29. Therefore, Theorem 9-31 
states for these cases that if af = 0 and if Gf is finite-valued, then f is a 
charge and its potential is bounded and satisfies g = —Gf; this is the 
analog of the Brownian motion result. 

We shall now introduce the recurrent analog of the equilibrium set 
for transient chains—namely, the small ergodic set. Potentials with 
supports in small ergodic sets will be found to have special properties. 


Lemma 9-32: If h is a non-negative bounded column vector such that 
lim, P”h exists, then the limit is a constant vector. 
ProorF: By Fatou’s Theorem 
P lim (P"h) < lim P"*tth, 
Thus the limit is finite-valued, non-negative, and superregular. Hence 
it is constant. 


In particular, if lim P" B® exists, then it has constant columns. 


Definition 9-33: If the indicated limit exists, then 
lim P*BE = 1AF. 


n> œ 


The entry A? is the probability in the long run of entering Æ at i. 
In the special case where Æ is a two-point set, we have °A; = Afi. 
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Proposition 9-34: If àF exists, then A¥ > 0, AZ = 0, and A*1 < 1. 
ProoF: The first two assertions follow from the facts that P"B¥ > 0 
for every n and that (P"B*),, = 0 if j is notin E. Also, 
(AF1)1 = (1A*)1 = (lim P"B*)1 < lim (P"B?1) = 1, 
so that A¥1 < 1. 


Definition 9-35: If àF exists and is such that A271 = 1, then Æ is said 
to be a small set. 


To justify the name small set and to prove existence of small sets, 
we shall show that subsets of small sets are small and that in a normal 
chain all finite sets are small. 


Proposition 9-36: If Æ is a subset of a small set F, then # is small and 
AE = AF BE, 


Proor: By Proposition 5-8, we have BE = BY BF. Since each row 
of P" BF tends to the probability vector àF, it follows from Proposition 
1-57 that 

P™BE = P BF B® — 1)" BE. 


Thus AF exists and À? = AF B®. In addition, A¥1 = AF B1 = AF1 = 1. 
Lemma 9-37: If P is a normal chain and if Spes "Np P$ is finite for 


every j € E, then à? exists and the columns of BE — 1A? are potentials 
with support in Æ. 


PROOF: 


(I — P)BE = ( 


and the columns on the right are charges for a bounded potential by 
Proposition 9-21. Thus the limits 


I — PF 
lim (J + P+---+ He 


n 


0 
= lim (I — P*)B® 
0 o0 n 


exist, and the resulting potentials are the columns of BE — 1Aë. 


Proposition 9-38: In a normal chain all finite sets are small. 
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Proor: Let E be a finite set. By Lemma 9-37, àF exists. Moreover 
AFA = > AF = > lim (PB), 
ick jeE 


= lim > (P"B*),; 


jeE 
lim (P"B*1), 
=k 


Corollary 9-39: In a normal chain ‘A; + 7A; = 1. 


Proor: Since ‘A; = Af) and 4A, = Afi}, we may apply Proposition 
9-38. 


Definition 9-40: A set E of states is an ergodic set if PF is an ergodic 
chain. 


Proposition 9-41: Æ is ergodic if and only if «x1 < œ. All finite 
sets are ergodic, a subset of an ergodic set is ergodic, and the union of 
two ergodic sets is ergodic. 


Proor: a,P® = ap. Hence P*¥ is ergodic if and only if a,1 < œ. 
Thus the ergodic sets are the sets of finite a-measure; the remaining 
statements follow from this observation. 


Small sets and ergodic sets might seem to be related notions, but we 
shall see later that they are actually independent. 


Proposition 9-42: If # is a small set and if g is a bounded potential 
with support in F, then g is regular at points of # and A¥g = 0. 
Conversely, if E is small and ergodic and if g is a bounded function 
which is regular at points of Ẹ, and which satisfies \¥g = 0, then g is a 
bounded potential with support in Æ. 


Proor: Let g be a bounded potential with support in Æ. Since 
(I — Pg is the charge, g is regular in #. By Theorem 9-15, B¥g = g 
and hence P"B¥g = P'g. Since P”g— 0, P"B¥g—+0. But by 
Proposition 1-57, since Æ is small, P"B¥g —> 1(A®g). Hence A¥g = 0. 

For the converse g(z,) is a bounded martingale at points of Ẹ, since 
g is bounded and regular in #. By Corollary 3-16 with the stopping 
time the time to reach E, we have g = B¥g. Now, since Æ is small, 


Pg = P"B¥g — 1(A*g) = 0 by Proposition 1-57. 
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In addition, 


a[(I — P)g] = of{(I — P)B*g] 


E d - > 


ll 


0 
= an(9z — P¥ gz). 


Since Æ is ergodic and g is bounded, a ;g; is finite and hence 


@n(9z — Pger) = arge — (agP*)g, = 0 
Therefore g is a potential by Corollary 9-16, and it is bounded by 


: (I — Pg; g 
hypothesis. Its charge 5 has support in Æ. 


For potentials of total charge zero in recurrent chains, the Principle 
of Balayage takes the following form. 


Proposition 9-43: Let E be a small ergodic set. If x is a function 
bounded on Æ, then there is a unique potential g with support in Æ 
which differs from x on E by a constant function. The potential is 
g = BEx — (àFx)1. 


Proor: Let g = B¥x — (A¥x)1. Then g is bounded, and 
(I — P*)x,z 
a- Py = ( 3 } 


3 


therefore g is regular on Ẹ. Since \¥g = 0, g is a potential with support 
in Æ, by Proposition 9-42; and g differs from x by the constant ÀFx 
on £. 

For uniqueness, let g’ be another such potential. Then gg — gg = k1 
andg — g' = B¥g — B¥g’. Theng — g' = k1. Since P(g — g')—0O, 
we must have k = 0. Henceg = g’. 


The result to follow is the total-charge-zero version of the Principle 
of Condensers. 


Proposition 9-44: Suppose that # and F are disjoint sets such that 
E U F is small and ergodic. Then there exists a function A such that: 


(1) A; = lifte E and h; = Oifje F. 

(2) f = (I — P)h has its positive values in Æ and its negative values 
in F. 

(3) h is the sum of a potential and a constant function. 
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Proor: For existence, let x be a function which is 1 on # and 0 on F, 
and set h = B¥’¥x. By Proposition 9-43, (1) and (3) are satisfied and 
f has support in E U F. For (2) we note that, if i c Æ, then f, is the 
probability of return to F before E and, if ie F, then —f, is the 
probability of return to Æ before F. 

For uniqueness, let x be a function satisfying (1), (2), and (3). 
Then x = g + cl, where g is a potential, by (3). By (2), 


(I — Pyg = (I — P)x 


vanishes outside of E U F so that g has support in HU F. Hence 
g = BFv-¥g. Therefore, 


BEF ya pa BEvFg + BEVF(c1) =g + c1 = £, 


and x is uniquely determined by its values on Æ U F, which are fixed 
by (1). 


We conclude this section with some results about normal chains 
which will be needed later. 


Proposition 9-45: In a normal chain 
a; 
Gry + Gry — Gy = "Ny 
k 
a 
Cu + Oy; — Cy = "Ny 
k 
Cu + Gai = "Na 
k 
Cie — + Ori = "Ny. 
Ok 


Proor: The first two expressions follow from Theorem 9-7 and 
Definition 9-24. For the other two, set i = j. 


By Lemma 9-32, if lim (P” FN) exists, then it is finite-valued and 
has constant columns. 


Definition 9-46: Fv, = lim, >), (P”)ix "N pj provided the limit exists. 
Proposition 9-47: Fv, exists if and only if AF’ exists. If they both 


exist, then vy, = AU? #N,,. Hence in a normal chain *v exists for all 
finite sets Æ. 
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Proor: *v; = lim $y (P"), "Np; and 


AFH EN, = lim > (P")y, BEY FN; 
k 

= lim > (P”) "Hi FN; 
k 


= lim > (Pn, "Np; 
k 


Remark: Actually, it can be shown that if E is small, then £ U {j} 
is small. Hence Frv exists for all small sets. (See the problems.) 


3. Ergodic chains 


For this section let P be a noncyclic ergodic chain, and choose a so 
that al = 1. Then L = lim P” = A =1a. The dual of P, namely 
P, is also noncyclic ergodic, and the mean first passage time matrix for P 
has all finite entries (see Proposition 6-42). 

We begin by proving that all noncyclic ergodic chains are normal and 
by giving an existence theorem for potentials. 


Lemma 9-48: For every i and j, 
0 < (NS? — NP] < My <œ and lim[NY — NY] = Mya. 


Proor: Summing over the powers of P in Lemma 6-34, we obtain 
oO 
Np = > FPN», 
k=0 
where we use the convention N = 0 ifm < 0. Since >?) FP = 
H, = 1, we have 


eo 
(n) (n) — (kp y(n) (n-k) 
NS Np = > FPG NY"). 
k=0 
As n -> 00, we obtain 


0 
lim [NYP — NP] = lim > FPINP - Ng] 
n n k=0 
foe) 
= > FP lim (NY — NG-) 
k=0 n 


by dominated convergence, since NW — N&- < k and > Fk = 
M, < œ. Thus 


lim [N — NYP] = > Fo lim > Pm 
k=0 


l 
& 
Ms 
= 
oe 
G 
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Theorem 9-49: Every noncyclic ergodic chain is normal, and 
C; = M,a. In matrix form C = M D^!. 


Proor: By Lemma 9-48, C exists and satisfies C;; = Mj,a;. Since 
Ê is noncyclic ergodic, C exists and hence G exists. "Thus P is normal 
by Theorem 9-26. 


Dually, Gy = q; M, or @ = M7™D-?. 
For noncyclic ergodic chains we can prove a stronger result than 
Theorem 9-31 about the existence of potentials. 


Theorem 9-50: If f is a function with of = 0 and Gf finite-valued, 
then f is a charge and its potential g is such that g = — Gf and ag is 
finite. If, in addition, Cf is finite-valued, then g is bounded. Dually, 
if u is a signed measure with u1 = 0 and uC finite-valued, then p is a 
charge and its potential v is such that v = —yC and 11 is finite; if, in 
addition, uĜ is finite-valued, then v is bounded by a multiple of a. 


Proor: We shall prove the dual statements. 
(EN), = > NP = -> MAN YP — NP] 
k k 
since >, uw, NY = NPY (u1) = 0. Now NP — NW < M,; by Lemma 


9-48, and a>, w,M,; = (uC); is finite by hypothesis. Hence, by 
dominated convergence, u is a charge and 


A -> PC ri. 
k 
By the dual of Theorem 9-15, 
en Yo 

y= pN + žo a. 

To show that v1 is finite, it suffices to show that |u| °N1 < œ. But 
Ja] °N1 = > Jel Ni; 
iJ 


2 Juil Mio 


—> \ui|Cip < © by hypothesis. 
0 


If pĝ is finite-valued, then v is bounded by a multiple of «, according to 
Theorem 9-31. 
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Corollary 9-51: 
(IT - Py-C)=I-A 
and 
(-G)ZI - P) =I- A. 


Proor: A row of I — P is a charge whose potential is the correspond- 
ing row of I — A. Since 


0 < (PC); = > Pi Cus 
k 


= a > Py Mes 

k 
= a,(M,; — 1) by Proposition 6-41 
< œ, 


(I — P\—C)= PC —C is finite-valued. Hence (I — P)(—C) = 
I — A by Theorem 9-50. The second result is dual. 


From Theorem 9-50 we have a sufficient condition on charges for 
their potentials to exist. We turn now to conditions on functions to 
ensure that they are potentials, the same problem that we touched on 
for null chains at the end of Section 1. We shall prove as Theorem 
9-53 the result for noncyclic ergodic chains that corresponds to 
Corollary 9-17 for null chains. 


Lemma 9-52: If h is a function for which «h is finite, then P"h — Ah. 


PROOF: 
1 
2 , PPh, = = 


1 
>= >. (a,h,)a, by dominated convergence 
7 


Theorem 9-53: If g is a function for which ag is finite, then g is a 
potential if and only if «g = 0. If g is such a potential and if f is its 
charge, then g = (I — A) °Nf. 


Proor: Set f = (I — P)g. Thenof = a[(I — P)g] = ag — «Pg =0. 
By Lemma 9-52, P"g-> Ag = 1(ag). Therefore, by Corollary 9-16, 
g is a potential if and only if eg = 0. Ifg is such a potential, then by 
Theorem 9-15 


g = Nf + gol. 
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If we multiply through by a, we obtain 
0 =a Nf + Jo- 


Hence 
g = “Nf — A°Nf = (I — A) Nf. 


Thus integrable potentials, those for which ag is finite, are precisely 
those integrable functions whose integral is zero. In particular, every 
bounded potential has integral zero. 

Examples show that associativity of A(°Nf) in Theorem 9-53 may 
fail. However, if it does hold, then Gf is finite-valued and g = —Gf. 
In fact, since the columns of °N are bounded, « °W is finite-valued. 
Hence, by Lemma 9-52, 


lim P" °N = A ON. 
On the other hand, by definition 
lim P? °N = 1 °. 
Hence by associativity 
g = Nf — 1 vf = [PN — 1 Vf. 


Therefore, by Corollary 9-27, 


gı = —(Gf) + Gy 


= —(Gf), since af = 0. 


For a further discussion of this point, see the Additional Notes. 

The existence of small sets for noncyclic ergodic chains is settled by 
the following proposition. Obviously all sets in such chains are ergodic 
sets. 


Proposition 9-55: In a noncyclic ergodic chain all sets are small, and 
ME = aB? and Ev = «EN. In particular, for the set of all states, 
AS = a. 


Proor: By Proposition 1-57, P"B® + AB®. Hence à? = «B? and 
M1 = «B1 = a1 = 1, so that every set E is small. For E = S, 
BS = I and thus AS = a. The assertion about Fv is proved similarly. 
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Corollary 9-56: If «h is finite and if (I — P)h = f, then BF} is finite- 
valued and B¥h = h — Nf. If f has support in E, then B¥h = k. 
If in addition h is bounded, then #2 = ah. 


Proor: Let g = h — (ah)1. Then g is a potential by Theorem 9-53, 
and (I — P)g = (I — P)h = f. The proof of conclusion (4) of 
Theorem 9-15 shows that B¥g is finite-valued and B¥g = g — ENJ. 
Therefore Bh is finite-valued and B¥h = h — "Nf. If f has support 
in E, then ENf = 0 since EN, = 0 for j in E. Hence BEh = h and 
a(B¥h) = oh. If h is bounded, associativity holds, and we conclude 
from Proposition 9-55 that A7h = ah. 


For total-charge-zero potentials in noncyclic ergodic chains, the 
following is the form that the Principle of Balayage takes. 


Proposition 9-57: If œh is finite and àF} is finite, then there is a unique 
potential g with support in Æ which differs from h by a constant on F. 


Proor: For existence B*h is finite-valued by Corollary 9-56, and we 
let 


g = BEh — (dh)1. 


Then g differs from h by the constant A¥h on E. Since ag = 0, g is a 
potential by Theorem 9-53. Moreover, 
(I — P*)h; 
f= (I - Py = (I P)B = ( ): 


0 


so that the support of f isin E. For uniqueness, let g’ be another such 
potential. Then g — g'is constant on E. Since B¥(g — g') = g — 9’, 
g — g' is constant everywhere. But if P"(g — g')—> 0, then g — g’ 
must be 0. Hence g is unique. 

The following summary of the results for ergodic chains may be 
helpful. We consider the set of all states as a denumerable measure 
space of finite total measure «1; the measure assigned to state 7 is œ. 
A function h is integrable if «h is finite; we restrict our attention to 
integrable functions. 

We know that an integrable function is uniquely represe “table as the 
sum of a constant and a potential. The constant is the integral of the 
function. Hence an integrable function is a potential if and only if its 
integral is zero. 
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We conclude this section with some results about M and àF which 
hold for ergodic chains. 


Proposition 9-58: 


Mu + My - My = u 
ik + 14a — or 
J 

KN. 

i 

My, + My = —: 

be 


Proor: In Proposition 9-45, substitute M,,«, for C,, in the second and 
fourth equations. 


Lemma 9-59: In any infinite ergodic chain, for fixed i and k 


j j 


Proor: Since 'H,; < ‘N,,,;, it suffices to prove the result for ‘N,,;. 
But D4 Nas = My, < 0, SO that lim; May = 0. 


Proposition 9-60: In any infinite noncyclic ergodic chain, for fixed i 
lim 1A, = 0. 
j 
PROOF: 
lim iÀ; = mi > æy Hp; by Proposition 9-55 
k 
= 2 Oy lim ‘H,,; by dominated convergence 
=0 by Lemma 9-59. 
Proposition 9-61: ÀF = «,M,, for i in F. 
Proor: By Proposition 6-16, 
a,BF = a PÑ; foriin F. 


Summing on j gives 


ÀF = > a, BF = Oy > FN, = Mig. 
j J 
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Thus Proposition 6-24 is the assertion that all sets are small in an 
ergodic chain. 


Lemma 9-62: If S = M,; + Mp, then 


N jj 
Sy << “a; 
Sy = Si = Íy = Sin 
M; = Sis 
and 
My _ "ry 
My *K, 


Proor: The first result is a restatement of Proposition 9-58. Since 
iÑ; = 1Nj,, we have Sy = Sy; Sy = S; by definition. Finally, the 
last two results follow from the identities 


which are consequences of Theorem 9-49 and Lemma 9-25. 


From this lemma we see that 
My = Sq = Sy = AM; + My), 
a formula which gives a means of computing M from quantities in P. 
Moreover, from Proposition 9-61 we have 
À, = aMi i. 
so that the lemma gives, on multiplication by S; 
iN., 
it 


i 


M; = KS 54 = a Miun 


or 
— N.M 
My = Ni Miui 


Proposition 9-63: In any infinite noncyclic ergodic chain 


| M, 

lim —* = 0, 
j ij 

lim M, = +0 
j 


and 
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Proor: By Lemma 9-62, 
My _ y 


My; 7K, 


By Proposition 9-60 the numerator tends to 0 and the denominator 
tends to 1. Hence the first assertion follows. Therefore, for all but 
finite many states j, 


= 1 
2M; z My + My = My = ys 
j 
Since > a, < œ, «; —> 0 and M, — œ. Finally 
157 1 
(Cy — Gula; = My — = = M; - Z% tN; 
j j 
= My — ‘AS = My — AlMy + My) 


M, 
= mfi e a,(1 + z)h 


ij 
The factor in brackets tends to 1 since ‘A; > 0 and M/M, — 0. Thus 
the third assertion of the proposition follows from the fact that 
M; —> ©. 
Corollary 9-64: In any infinite noncyclic ergodic chain, C ¥ G. 
Proor: If C = G, then (Cy — Gy); = 0 for every j. 
On the other hand, there are many finite ergodic chains with C = G. 


An example is 
l-a a 
P= 
a l-a 


4. Classes of ergodic chains 


for0O <a< l. 


Let P be a recurrent chain. In this section we shall investigate the 
finiteness of the rth moments of certain random times and obtain 
formulas for these moments. Let 


Mp = Mitr], 
bm = Mit,"], 
C= > o, M,[t,7] or c? = aM”. 


These quantities are rth moments of the first passage times, the return 
times, and the equilibrium first passage times, respectively. 
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Let 0 be a distinguished state and write 


{0} {0} = 

{0} (Poo U Op 

P= ~ ( ) and «= (a a). 
{H\R Q 


The chain °P obtained from P by making 0 absorbing is an absorbing 
chain and has N = 5 Q* as its fundamental matrix. The time to 
absorption a in the chain °P is the same as the time t, to reach 0 in the 
chain P. As in Section 8-8 we let a be a column vector indexed by 
the transient states of °P and satisfying 


ay = Mito] = M. 


Since MQ = 0, Proposition 8-68 enables us to compute any column 
of M™. From the relation cf? = ga we also have a formula for the 
computation of c. An easy calculation gives 6% in terms of a™ 
with m < r: 


bP = Molto] = > Por Malto + 1) 
k 


= Po + >. Po p> (n) M;[to"] 


k#0 
> (7 (m) 
m 
Poo + o (z) Ua™, 


The first three propositions to follow give conditions for the finiteness 
of M™, b, and c”. 


Proposition 9-65: If r > 0, then bf < œ if and only if c¥-P < œ. 
Proor: Sincea™ < a” for m < r, bY < œ if and only if Ua” < oo. 
Multiplying the inequalities of Corollary 8-69 through by U gives 
Ua” < cUNa"-) < dUa™. 
For j # 0, 
— a; 
(UN); = > Por Ni; z No; = 


&o 
Thus 
1 1 
UN mat area A T-1i] = — cf, 
a ao 2 a; Mi[t0 t] Gy 0 


and c87? < oo if and only if bf < œ. 
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Lemma 9-66: For any two distinct states i and j, let t;;(w) = 
t;(w,,) be the additional time needed to reach j after i. Then for any 


powers r and s 
M,{t,” tj] = M.t] Mi[t;]. 


PROOF: 


M,{t,’ ti] = > Pr,{t, 


m A tij = nj mn’ 


> Pr, [t; = m] Prt; = n] mn 


> Pr,[t; = m] Pr,[t,; = n | t; = m] mns 
mn 


by the strong Markov property 


> Pr,[t; = m] ae Prt; = e) 


= M,{t”] M{t,’). 


Proposition 9-67: The vector b” is finite-valued if and only if M is 


finite-valued. 


Proor: By definition b = Mt]. Let t = min (t, t,) for i # j, 


and let u = t; — t (or Oif t = œ). Thent, = t + u > u, so that 


bP = Mfu] 
= > Pr,[x, = k] M,{t,"] by Theorem 4-11 
k 


IV 


Prix, = 2] M{t,’] 
‘AM. 


Since iH, > 0, if b < œ, then M < œ for all i. 


Conversely, suppose that M is finite-valued. Since t, < ti + ti; 


for any state i # j, 
bP < Milt + 6.,)"] 


r 
(7) M,{t,” 75 


l Í 
M- iM: 


(n) MP Mi-™ by Lemma 9-66. 
o \m 


m= 


Hence 6 is finite-valued. 
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Proposition 9-68: If bf < œ for some state 0, then bP < œ for 
all j. 


Proor: The proof is by induction onr. Forr = 0, b% = 1, and the 
result is trivial. Suppose that the result holds for r < n and that 
byt < oo. Then bf < œ forr < n, so that bf < œ forr < n by 
inductive assumption. By Proposition 9-67, M is finite-valued for 
r < n, and by Proposition 9-65, cf? < œ forr < n. Thus forj # 0 

ce) = > oy, M,Lt,"] 
k 


< 2 œr Mil(to + to,;)"] since t; < to + to, 
n 

-3a > (7) Malta’ 3 

2, Me > (7 \ugug- r) 

S ( joug- r) < ©. 


r=0 


Therefore b+!) < co by Proposition 9-65. 


Finally we show the connection between the moments in P and those 
in Ê. 


Lemma 9-69: FY? = FR in a recurrent chain. 


Proor: By Lemma 6-34, for n > 1, 


By induction we see that FS} is a function of PW for k < n; similarly 
Fo i is the same function of ÊY. Since PY = P®, FY = FH. 


Proposition 9-70: b? = 6 and c? = é. 


Proor: By Lemma 9-69, 


bs = Moir] = > VER = 2 ni PR = a Mo[é"] = bp. 
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For the second assertion, 


a; Pri[to = m] = D Pir, Peira <- Pre yo 


= > oP ohn — 1 ee Piar Pri 


where each of the sums is taken over the denumerable number of 
sequences k,,...,k,_, with each k, different from 0. Hence for 
m > 0, 


> o; Prity = m] = æo > Poy... -Pror 
i 
a(l — Hg») 


«(l1 — H-V) by Lemma 9-69 


= > a, Pr{t) =m] by symmetry. 
If we multiply through by m” and sum on m, we obtain c = é@. 
We thus arrive at a hierarchy of recurrent chains: 


Definition 9-71: A recurrent chain P has ergodic degree r if b{ < œ 
but 09+? = œ. If bP < æ for every r, then P is said to have 
infinite ergodic degree. [It should be noted that b® = 1; hence the 
degree is always defined. ] 


We may summarize our previous results as follows: 


(1) Ergodic degree does not depend on the choice of the state 0. 

(2) P is of ergodic degree r > 0 if and only if c¥~)» < œ but 
c = œ. (The choice of 0 is immaterial.) 

(3) P is of ergodic degree r if and only if M“™ is finite-valued but 
aM is infinite-valued. 

(4) P has the same ergodic degree as P. 

(5) If P is of infinite ergodic degree, then M™, b”, and c™ are finite- 
valued for all r. 


For example, null chains have ergodic degree 0 and ergodic chains 
have ergodic degree at least 1. We shall see in Section 6 that the basic 
example may be of any degree r = 0,1, 2,..., œ. 


Proposition 9-72: Every finite recurrent chain has infinite ergodic 
degree. 


Proor: For a fixed state 0 the result follows by induction on r from 
Propositions 9-68, 9-67, and 9-65. 
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5. Strong ergodic chains 


A strong ergodic chain is a recurrent Markov chain of ergodic degree 
2 or greater. Every strong ergodic chain is ergodic, and by Proposition 
9-72 all finite recurrent chains are strong ergodic. By Proposition 
9-65, P is strong ergodic if and only if aM is finite-valued. If P is 
strong ergodic, then so is Ê. By Proposition 9-70, «aM = «M. 

In this section P is a noncyclic strong ergodic chain and « is chosen so 
that a1 = 1. 


Proposition 9-73: If P is strong ergodic and if f is a bounded function, 
then Gf is finite-valued. If, in addition, af = 0, then f is a charge and 
its potential g satisfies g = — Gf. 

Proor: If |f| < k1, then 

G|f| < kG1 = k(M7D-)1 = kM a? = ka) < œ. 
Hence G|f| is finite-valued. Ifaf = 0, then fis a charge with potential 
— Gf by Theorem 9-50. 


Definition 9-74: For any noncyclic ergodic chain P, a matrix Z is 
defined by Z = >°_,(P — A)" whenever that sum exists and is 
finite-valued. 


Proposition 9-75: Z exists if and only if P is strong ergodic. If P is 
strong ergodic, then Z = A — G(I — A). 


Proor: Suppose P is strong ergodic. Since a(l — A) = 0, each 
column of J — A is a charge with potential 


AGE Saye > PI — A) 


by Proposition 9-73. By induction we verify that for n > 0 
P" —~A=(P-— A) 
and hence P"(I — A) = P” — A = (P — A)". Therefore 
-GI-A)=I-A+ > P(L- A)=1I-A+ D> (P- AY 


n=1 n=1 


-A + 5 (P — AY. 
n=0 


Hence Z exists and equals A — G(I — A). 
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Conversely, suppose Z exists. Then 


(eM);0; = (aMD~*); = (aC), 


< lim inf > [e (N97? — N&-P)] by Fatou’s Theorem 
n k 


= m inf (NGD — na;) 


S aint S (P 
m=0 
nod 
= anne aI (P — A)"],; — A 


= Zy E 


Hence «M is finite-valued, and P is strong ergodic. 


Proposition 9-76: If P is strong ergodic, then 


(1) dual Z = Z. 

(2) Z1 = 1 and aZ = 
(3) ZI — P) =I- oe (I — P)Z. 
(4) ZI- P+ A)=1=(1- 
(5) Z=A-—GI-A)=A-(I — AJ. 


Proor: For (1) we have 


dual Z = dual (> (P — ar) = > [dual (P — A) 
n=0 


n=0 


= S (Ê — A} = Ê. 
n=0 
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Hence in conclusions (2) through (5) the second result is always the 
dual of the first, and we need verify only the first. Conclusion (5). 


comes from Proposition 9-75. For (2), we have 
= [A — GU — A)jĵ1 = Al — (G — GAM 
= 1 — G1 + GA1 since G1 < œ by Proposition 9-73 
=1-—-G14+ G41 
= 1, 
For (3), 
Zi — P) = [A — GW — A)(I — P) 
(-G + GA) — P) 
-G + GA + GP — GAP, 
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since by Proposition 9-73 all terms are finite-valued, 


-G+GA+GP-GA 
—G(I — P) 
=I — A by Corollary 9-51. 


ll 


Conclusion (4) follows directly from (2) and (3). 


Proposition 9-77: In a strong ergodic chain 


C= HZ, -Z 
and 
M = (EZ,, — Z)D. 


Proor: The second assertion follows from the first, since C = MD-}, 
For the first we have 


n 
Z; — a; = lim > (Pm — A); 
n m= 


= lim [NP — (n + 1)æ] 
and similarly 
Z; — æ = lim [NYP — (n + 1)a] 
Hence 
Zy — Zy = lim [NH — NP] = Cy. 


The next proposition shows that Z may be used as a single potential 
operator in place of both —C and —G. Since (dual Z) = Z, duality 
takes as simple a form as for transient potentials. Beginning in 
Section 8 we shall develop an operator — K which exists for all normal 
chains and which has properties similar to those of Z. 


Proposition 9-78: In a strong ergodic chain, Z may be used as either 
a right or a left potential operator. 
Proor: If g = —Gf and af = 0, then 
g =[A - GU — A)lf = Zf 
by Proposition 9-75. The result for signed measures is dual. 
The operator Z is used in Kemeny and Snell [1960] in the analysis of 
finite recurrent chains, which are all strong ergodic. A number of 


quantities associated with recurrent chains are computed in that book 
in terms of Z. The proposition to follow is a sample. 
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Proposition 9-79: If P is strong ergodic, then 


(2) Z; = Hs 5 
(3) May — My = Ma- My. 
Proof: From Proposition 9-77 and 9-76, 
aM = («EZ — «Z)D = (EZ4, — «)D = EZ,,D — 17, 
and (1) follows. For (2), 


= is Z, 
My = My +My = Mg+l= a 
Finally by (1) and Proposition 9-77, 
M,- My =Ži-1= 12 hua- Mp 
a; a; 


6. The basic example 


The basic example P is recurrent if and only if 8, > 0. Then £ is 
regular, and we choose « = 8. First we consider the case where P is 
null, that is, where >, 8; = +00. 

Since a null chain with high probability will be outside a given 
finite set after a long time, the finite set must be re-entered from the 
left. Hence if Æ is finite, 


i 1 if 7 is the first state of E 
D 0 otherwise. 
In particular, 
l1 ifj<i 
i = 
0 ifjzi 
Since 
BF aaa 
1 1 = ifj <1 
N; iH = iff. = Bi 
Lis H; ji ages j 
1 ifj 24%, 
we have 
B; ifj< i 
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To compute C, we note that the reverse chain Ê enters finite sets from 
the right. If Æ is finite, 


ae l1 if zis the last state of E 
E 0 otherwise. 
Hence 
A 1 ifj>i 
0 ifj <i 


1 ifj>7 
Gi; = À; N a 
0 ify <i 
and 
B; 

Ai => ifj<i 

PAR Sake * 
0 ifjzi 


Note that C = G and that —C or —G is obtained if, in the transient 
case, we let 8a = +00 in the formula for N. 

Let us consider ÀF for infinite sets. For convenience we assume 
that Oisin Æ. The probability of entering Æ in the long run at a state 
j > Ois no greater than the probability of being between 0 and j in the 
long run, and hence àF = 0 for j > 0. For any state k, let k’ be the 
next state in Æ (with the convention k = k’ if ke E). Then 


F By: 

[3 ra $2) el 

re 2, Ok By 00 
Bie 


1 — lim > P® 
n 2 ne Bx 


Therefore the last term must be 0 for a small set. 
We shall use this criterion to give an example of an ergodic set which 
is not a small set. Let 4 < p < l and 


NM = li PW BE 
0 a 0k Pko 


p if iis a power of 2 
Piisi . 
l1 otherwise, 
and let E be the set of all powers of 2 together with 0. Then 
1 ift=0o0rl 
p” ifi > l and 2-1 <i < 2", 


Since > f, = 2 + SP, 2""'p" = œ if p > 4, the process is null. 
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On the other hand, Die; Ri = 1 + >R-0 pP" < ©, since p < 1; hence E 
is an ergodic set. Now 

Spe = > Pers, 

k k 


k 


so that A¥ = 0 and Æ is not small. 

For this example the bounded function g = 1 is regular outside of E 
and has àg = 0, but P"g does not tend to 0 and hence g is not a 
potential. Thus Proposition 9-42 fails if we require that E be ergodic 
but not necessarily small. 

Moreover, the same process and the same set give an example of a 
function f with support in # and having «f = 0 and Gf finite-valued 
which is such that — Gf is not a potential. For any function f, 


1 i-1 
(- Gf) TAE. >. Pif; 
Bi 5% 
If we let 
0 for i = 0 and for i ¢ E. 


fi=<4-—-1 fori=1 
1 for other i e E, 


Ii 


and specialize to the case p = 4, then 


af= -1+ > 2-"=0 


n>0 


0 fri=0Q0orl 
(—Gf i = . 
2 otherwise. 


and 


By Corollary 9-17 the function 
2 for i=0or 1 
i= . 
0 otherwise 


is a potential. Since the sum of — Gf and the potential g is a constant 
vector and not a potential, —Gf cannot be a potential. We should 
note that f is not a weak charge since 


(Cf), = (Gf), = 2, f; = +0. 


With minor modifications of the above example, we can make 
A*1 = AE assume any value between 0 and 1. For example, to get 
AE = p with $ < p < 1, redefine E to consist of 0 and all states of the 
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form 2° + 1. To get values of \F < }, redefine the process (and the 
set) so that the process can move from 7 to 0 only when is a power of n. 
Now we consider the case where P is ergodic. Let 


a; = B; / > Be 
and define 
i-1 
Ce > Pr- 
k=0 
Then co = Oando, = >; Êk; Co is finite for all ergodic basic examples, 


and œ; = B,/on. 
The formula for ‘N, still applies, but AE = «B? and AF is positive 
on all of E. In particular, if i < j, 


i 
tÀ; = 2 Qk iH p; = > ou Be = (j oo t)or;. 


k=i+1 
Ifi > j, then 
tA; = 1-7, = 1 — (i — jja. 
Therefore 
(j — i)a if7>% 
Gi; z y N; = la; 


— + (j — i)a ifj <i. 
æi 


To get C, we first compute G. Ifi >j, 


CA a; oj 
Oo Qj Co 
Ifi <j, 
e GC. a; 0 
ane 2 ae 
Soo Qi Oo 
Thus 
æ; GC, (og . 
a e ifj<i 
2 Qi O» To 
Gy = 
a; Cj o 5 . . 
1 + 2 — L2 ifj>i, 
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and 
“moan if j>i 
& To a; Co E 
Cy = Gy = 
SE AE EI if 925, 
Ton hi Co a; l 
Also 
gau -jey  ifjsi 
ee | $ $ sk P 
d a? (i -j +1 ifj>i. 


In an ergodic basic example, C is never equal to G. For suppose 
Co; = Go; for all j. Then 


Oj A 
PE Iaj 
or 
o; = JP; 
for all j. By induction, 8; = ß;+ı for every j, in contradiction to the 


fact that 8; must tend to 0 in a recurrent basic example. 
In an ergodic basic example, we have 


l c; loa eat . 
Sees ee ifj >% 
A; To A; On 

M; = 
lo; loa 1 . 
== ——— + ifj <i, 
a; Ta a; Tn a; 


since M, = Oyla. 

It is clear that M,[t,"] is finite if and only if M,[t)"] is finite. In other 
words, ergodic degree is independent of the state. On the other hand, 
M,[t”] is finite if and only if M,[t,"] is finite and if and only if M,[t."~+] 
is finite. But trivially, M,[¢."~!] = Èp apk"! = (1/00) >, Brk” t. 
Hence P and Ê are of ergodic degree at least n if and only if >), B,.k"~+ 
is finite. The chain with p; = (¢/(¢ + 1))"*} has B, = (k + 1)7"7}; 
for this chain >), k"~48,, < œ while >, k"8, = oo, and the chain is of 
degree n. To obtain a chain of infinite degree, let p; = p for every 
i > 0; such a chain represents “repetition of a single task.” Then 
Bk = p" and $, k™p* < œ for all n. 

Turning to ergodic potentials, we know that if af = 0 and if Gf is 
finite-valued, then f is a charge with potential g and 


1 
9% = —(Gf), = -> jaf; = a, Py asf. 


Hence if af = 0 and if >, ja,f, is finite, then f is a charge and its 
potential g satisfies g = — Gf. 
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We can show as follows that af = 0 is not a sufficient condition for f 
to be a charge, even in an ergodic chain. Let h be a function such 
that h > 0, oh = +œ, f= (I — P)h, and af = 0. For example, 
choose p; = (i/(i + 1))?/?, so that B, = (i + 1)~9/ and the chain is 
ergodic; then h, = Vi + 1 has all the required properties. For any 
such function h 


(I+ P+---+ Pf = (I — Ph 
and, by Fatou’s Theorem, 


lim inf (P"h) > Ah = +00. 
n 


Hence f is not a charge. 

For ergodic chains, potentials among all integrable functions are 
characterized by the fact that they have integral zero—that is, ag = 0 
or v1 = 0. We shall conclude this section by constructing a non- 
integrable potential. The chain we use will be the reverse of the 
fixed-task example with p = 4. Then a, = (4)**} and 

Prag = 
Po = ($)'*? = a. 
Let v be a row vector; the necessary and sufficient condition on v that 
v be a potential is that vP” —> 0 and that [v(J — P)]1 = 0. Now 


(0) m m-1) — -1 
PoP? = ôy and PẸ = È Part = > 4 Pe : 
k 7 


= Qj. 
Ifi > n, then PP = 8,,,,, andifi < n, then 
PM = > PO Pm-) = Pa-) = g, 
ij ik” kj Oj 1 
k 


Therefore 
Bijan ian 
Pp = l 


aj ifi< n. 


Now (vP"); = Xin PP = a dien% + vin Thus vP”—0 if and 
only if 


On the other hand, we are trying to construct v so that |v|1 = +00, 
and hence the series > 72 o v; must converge conditionally to 0. 
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The condition [v(I — P)]1 = 0 imposes more restrictions on v. 
We have 
WA — Ply = (v; = v+) — ero; 
so that [v(J — P)]1 is well defined if > |v; — v;,,;| < +00. And if 
> |v; — vj+ı| < +00, then 


[WI — P)1 = v — lim v; — > ay = —lim »,. 
j j i 


We conclude that v is a nonintegrable potential if: 


(1) >720 v; converges conditionally to 0. 
(2) > |v; =, viaal < +0. 


If vo = 0, these conditions are necessary and sufficient. It is easily 
verified that the sequence 


0, l, -1, Ł Ł —i, -4 a aie: -ġ4, —4, -ł4, ik, 


satisfies conditions (1) and (2). 


7. Further examples 


EXAMPLE 1: Independent trials process. 
In an independent trials process P, = p; independently of i, where 
p; > Oand > p; = 1. In such a chain P” = P since 


Pp = È, PRD Pys = > PR? p; = p; = Py. 
k 


It follows that P is recurrent and that A = lim P” = P; the chain is 
noncyclic ergodic and «, = p;. In addition, 


Z=I+ > (P*— A) =], 
n=1 


so that P is strong ergodic. 
If af = 0, then f is a charge with potential g = Zf = f. If p1 = 0, 
then u is a charge with potential v = uZ = p. 
We have 
Pro[t) = k] = (1 — Po)" Po. 
Hence P is of infinite ergodic degree. To compute the M-matrix, we 
note that 
Prft; = k] = (1 — p)" ip; 
and therefore 


1 1 


M., = kal — je-l = (3) - > 
ij Pi > ( Pi) Pi př D; 
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EXAMPLE 2: Reflecting random walk. 

We return to the reflecting p-q random walk of Example 1 of Section 
6-7. We first determine its ergodic degree. From 0 the chain returns 
to 0 either in one step or in an even number of steps, say 2n, and 


Proltp = 1] = 4. 


Each path returning to 0 in 2n steps has n steps to the right and n to 
the left. Thus the probability of such a path is (pg)". From Feller 


— 2 
[1957], p. 71, we see that the number of such paths is H Ji 


Therefore 
2n — 


Prolto = 2n] = K A 


i ) = (pq)”. 


We know from earlier results that the chain is recurrent if and only if 
p < 4. The moments of the return time to 0 in this case are 


5 z 2n — 2\ 1 
rT) — r i n 
Mai =a + S eay( r a) 5 (7a 
By Stirling’s formula, for large n 
2n — 2\ 1 
r pint Min n yt —3/2 
(ony (>) 5 (at ~ lipan 
where c is a constant. If p = 4, then 4pq = 1, the series converges 


only for r = 0, and the chain is null. But if p < 4, then 4pq < 1 and 
all moments are finite. We may summarize as follows: 


p>t transient 
If< p = 34>, then P is< null 
p< ergodic of infinite degree. 


If p < 3, we can find C = M,a; from the calculation of M, in 
Section 6-7. The reflecting random walk satisfies P = P, and hence 


aC 
a 


Gy = Gi; = = aM, 


If p = 4, the process is with high probability far from state 0 after 
along time. Hence ‘A; = lifj > 1%. Thus 
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We may compute ‘N, for j > i as follows. Forj > 0, 


Ay; = 4 Aya + Has 


= $+ $°H;_1,; 
1l lj— 1 
EE pa Sata from Section 5-4 
2 2 
= 1 _ t, 
2j 


Therefore, °N,, = 2j and '‘N,,; = °N;_,;-; = 2(j — t). Thus 
2(j— i) ifj>i 

Gy = { ar 
0 if j 


i. 


lA 


To find the C matrix, we note that 
Ci; = es ji since P = Ê 
ay 
= Q; since a = 17. 


In the case p = 4, the condition af = 0 is the condition >, f; = 0. 
Then 
(Gf = 2 > G- OF 


j>i 


= 22 0 -Df + z2 i- Òf; 
2D i +2 2 Om De 


Now 


(Cf); = (Ef); = (f7@), = 2 2 fG — 4); 


and the right side is always finite. Thus if Gf is finite-valued and 
af = 0, f is a weak charge. That is, if >, f; = 0 and if >, j |f| < œ, 


then 
9 = -22 it -2 > (i -Df 
j<t 
is a bounded potential. 


EXAMPLE 3: Sums of independent random variables on the line. 

We consider one of the chains covered in Proposition 5-22; they 
represent sums of independent random variables on the line with 
finitely many k-values. If the mean of the k-values is zero, the process 
is recurrent. And since « = 17, such a chain must be null. 
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In a recurrent process of this type 
a 
j = = ( 
[np — Ng] = [NP — NP] = (NY — NP]. 


It follows that G exists if and only if C exists, and if they both exist, 
then C = G. We are going to show that for this process ‘A; does 
indeed exist for all i and j and hence the process is normal. 

Noting that 


nr, = um 2 PO tH pj, 


we write for fixed N 


N 
> PUR As = > PRH, + > Poe Ay + > PR Ay; 
N k<-N k>N 


k ké- 
Suppose for the moment that the two limits 


Hysj = lim ‘Ay; 
k> +0 
and 


i e ; i 
Hoos = lim 'Hy 


exist. Choose N large enough so that 


F |H py = ss ee < € 
an 

|H -kj = Hg | < € 
for k > N. Then 


N 
> PRA = > PR'Huy + ‘Houf 2 Pe) 
k k=-N k<-N 
+ Hoos 2 PR) + È Pew 
k>N k : 
where 
0 for -N<k<WN 
Ek = 


; i . 
‘Hy; — ‘H L o,; otherwise. 


Since |e,| < ¢, the last term on the right is less than « in absolute value 
for every n. Moreover, the first term on the right tends to zero with 
n, since P is null. Finally by the Central Limit Theorem (Theorem 
1-68) 


Di = 


lm > PR = lim X PR = 


n k<-N n k>N 
Therefore ‘A, exists and satisfies 


A; T Hoj + 2H goj 
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The missing step in the argument is the proof that ‘H_,, and 
'H o; exist. We shall now show that ‘H _,, ; exists; the other proof is 
similar. Since ‘A; = °A;_; and °A_, = fà = l — °A,, it suffices to 
prove that °H_,,; exists for j > 0. If v is the largest k-value and 
E = {0,..., v}, then 

°H p; = BZ ks Ass, 
seE 
since neither 0 nor j can be reached from the negative side except 
through E. If we can show that BE o s = lim,._.. BE, exists, then 
since Æ is finite, we will have 


H ee > BE zs Hg 
seE 
Thus form the ladder process P+. Then př = 0 unless 0 <j < v. 


By the special choice of Æ we have B®,, = (Bt)£,,. Letting 
fi = pj and u, = H*,o, we see that ug = 1 and 


n n-1 n-1 
ae + + = + = 
Un = > PeH? a-w. = > Par -xo = > Sn-KUe- 
k=1 k=0 k=0 


Hence by the Renewal Theorem (Theorem 1-67), we have 


1 


1 + pan, 
lim Htp = ~p 


n= œ H 


where pt = X jp}. But (Bt)®,9 = Htp o, and 
At, = (Bt) 2 ks + > (Bt)? Hits. 


j<s 
Since lim,.. H*,, = 1/u*, we can prove by induction on s that 
(B*)® o.s exists. Therefore °H_,,, exists, and P is normal. 

We conclude this section by treating the case of the one-dimensional 
symmetric random walk, in which p- = p}, = 4. For j > 0, it is 
clear that °H_,,,; = 0 and °H, = 1. Hence ‘A, = $ and G; = 
3 ‘N,;, in agreement with Corollary 9-29. Forj > 0, °N,,; is the same 
as for Example 2 with p = 4, since the two processes stopped at 0 are 
identical. Hence °N,, = 2j. Ifj > i, then 


N; = ONG -iji = 2(j — 2), 
whereas if j < i, then ‘N;,; = Na = 2(i — j). Hence 
Gij r tN; F |i - j|. 


Thus the potential operator is the absolute value of the distance, just 
as in classical one-dimensional potential theory. The conditions 
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af = 0 and Gf finite are the conditions >; f; = 0 and >; j|f;| < œ. 
Since Cf = Gf, we see that if > f; = 0 and >j|fj| < œ, then 


n= => | — jh; 
J 


is a bounded potential. 


8. The operator K 


We introduce a new matrix to serve as potential operator for re- 
current chains. The operator K will combine many of the properties of 
C and G and will have the single drawback that some of its entries may 
be negative. Our procedure will be first to tie the K-operator into our 
present notion of potential, then to define in terms of K the recurrent 
version of capacity, and finally to introduce so-called generalized 
potentials, in which we do not require total charge zero. With the 
generalized potentials we shall be able to prove analogs of the classical 
potential principles. 

Let a distinguished state 0 be specified. 


Definition 9-80: The K-matrix is defined by 
Ky = lim [gg © - mp, 
n> o Qo 
whenever the limit exists. 


Lemma 9-81: If the indicated entries of C and G exist, then 
Ky = Cy + (Go; — Cos) 


and 
Qa; 
Ky = Gy + (Cio — Gio) = 
0 
PROOF 
K,,; = lim [vež = Ng 
0 


= lim [se = = Ng| — lim [N 
o 


= (Go; — Coj) + Ci 
By Proposition 9-45, 


@ — NG] + lim [NY — NP] 


a; a, 
Gio + Go; — Gy = Nis = Cio = + Coy — Cis, 
Qo Zo 


and hence 
a; 


Ci; + (Go; — Coj) = Gu + (Cio — Gio) 


Qo 
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Lemma 9-82: K exists if and only if P is normal. 


Proor: If P is normal, then C and G exist, and hence K exists by 
Lemma 9-81. Conversely, suppose K exists. Since Kj) = Cio and 
Ko; = Gos, Cio and Go, exist for all i and j. Thus P is normal by 
Theorem 9-26. 


Lemma 9-83: K = dual K. 


Proor: Since G = dual C and C = dual G, we have 


(æla) R; = (ae)| Gy + (Gyo i Gio) =| 
= Cy + (Go; — Coj) = Ki. 


The fact that K = dual K is the key property that the K matrix 
has and the C and G matrices lack. It is what is behind our first 
important result. 


Theorem 9-84: Let 


v = lim [u(I + P +--+ P”) 
and 
= lim((I + P +--+ P”f] 


be potentials of weak charges in a normal chain P. Then v = —pK 
andg = — Kf. If Y is any matrix such that for all potentials v and g 
of finite support v = pY and g = Yf, then Y = —(K + k(1a)), where 
k is a constant. 


Proor: By Theorem 9-31, v = — uC andg = —Gf. But from Lemma 
9-81 and the fact that u1 = af = 0, we see that —uK = —pC and 
— Kf = —Gf. Hence K has the desired property. If Y is an operator 
that serves for charges p, then pY = —pK, so that (Y + K) = 0. 
Taking p to be the row vector with 1 in the ith entry and — 1 in the jth 
entry, we see that p is a charge and that the ith and jth rows of Y + K 
are equal. Hence Y + K has constant columns. A similar argument 
with potential functions shows that Y + K has rows proportional to «. 
Therefore Y + K = —k(1ca) for some k. 


Actually at any time when v = —yC or g = —Gf, we may use K 
in place of C and G. Thus K serves for both functions and measures. 
And the theorem shows that the only other two-sided potential 
operators differ from — K by a multiple of 1c. 
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We note from the definition of K that if C = G, then K =C = G. 
Thus in the classical cases (the symmetric random walks), the three 
operators coincide. 

We introduce the notation 


I-P® 0 
Fd, = "i/o, UF = Fla, and WE = | i i} 
Thus Ēd = dual £f and IE = dual 1°. 


Proposition 9-85: If P is normal and Æ is a finite set, then 


(—K)W* = BE — 1AF 
and 


EN 
W4(—K) = ( ) — Fa. 
0 


Proor: Lemma 9-37 proves that a column of BE — 1AF is a potential 
with charge the corresponding column of WF. The first result then 
follows from Theorem 9-84. The second result follows by duality 
from Proposition 6-16. 


Corollary 9-86: If P is normal and Æ is a finite set, then 


—K,(I — PE) =I —1A% 
and 
(I — P®)(—K,) = I — lap. 


Proor: Restrict the equations of Proposition 9-85 to square matrices 
indexed by the states of E. 


We shall see shortly how Corollary 9-86 may be used to compute 
PE from K,. Although we have the formula Pë = T + UNR for 
PE, this expression is not of practical value for infinite chains, since N 
is indexed by the states of E. On the other hand, K, can be com- 
puted without finding all of K, and hence PF can be calculated from Kẹ 
for finite sets E by using only finite matrices. 

From the fact that — K is a two-sided potential operator, we see 
from the proofs of Proposition 9-85 and Corollary 9-86 that 


(I — P®)C, = —I + lea; 
and 
G,(I — PE) = =I + 1AF. 


We turn now to a discussion of capacity. Throughout the remainder 
of the section we shall assume that P is a normal chain. 
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Proposition 9-87: For any finite set Æ there is a constant k(Z) such 
that 


ll 


MK, = k(E)eg and Kaf = k(£)1. 
Furthermore, 


h(E) = AK, and k(E) = &(E). 


Proor: In Corollary 9-86, multiply the first equation by Kp on the 
right and the second by Kg on the left and equate the results. Then 
T(AgK yz) = (Kgl§)og. 

Thus for some constant k(H), we have K,lE = k(E) and \EK, = 
k(E)a,z. Multiplication of AEK, = k(E)ags on the right by l gives 
k(E) = AEK}, since «lf = 1. The dual of this equation is k(E) = 
NER „4E, and thus k(E) = E(B). 


Definition 9-88: For a finite set E the constant k(#) such that 
Klē = k(E is called the capacity of E. 


Just as the K-matrix in general depends upon the state 0 selected, 
so capacity in general is a function of the distinguished state. If 
E = {i}, then A¥ = ô,;, and from Proposition 9-87 we see that 

Ky = k({t})o, 

or 

k({t}) = (Gor — Coi)/o- 
In particular, k({0}) = 0. 

If we form a K’ matrix by using a distinguished state 0’, then 
Ki; ac K; = (Go; = Cos) = (Go; = Co;) 
= (Coy — Goo')aæ;/ao 

by Proposition 9-45. But K = K’ + kla by Theorem 9-84. Thus 


_ Goo — Coo: E ' 
so 
and 


K = K' + koa. 
Therefore, from Proposition 9-87 we see that 
k(E) = k'(E) + k({0'}) 
for every finite set H. Thus capacity is determined up to an additive 
constant. If we let E = {0}, we find that 0 = k’({0}) + k({0’}) or that 


k’({0}) = —k({0'}). Note that since k({0’}) = (Goo — Coo')/æo, when 
C = G the capacity does not depend upon the choice of 0. 
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In general, there is no reason why k(E) should be positive. However, 
if 0 is in Æ, then 
k(E) = (KI*), = (GF) = 0, 
since KIF is constant on Æ. The dual expression is 
aok(E) = (AFC). 
In the following discussions we shall restrict ourselves to sets which 


contain the reference point. We accordingly denote by X, the 
collection of all finite sets which contain 0. 


Proposition 9-89: If Æ isin &o, then k(E) > 0 if and only if there is a 
j in E such that °A, > 0 and AF > 0 (or, equivalently, there is a j in E 
such that °X, > 0 and AF > 0). 


Proor: For Ee Lo, 
k(E) = 


| 

M 

Q 
g 


jek 
ON... 

> oa(u). 

JEE J 


Thus k(#) > Oifand only if there isa non-zero term. Since °N,,/«; > 0, 
the first assertion of the proposition follows. The equivalent condition 
follows by duality. 


ll 


Proposition 9-90: If P is noncyclic ergodic or if P is a symmetric 
sums of independent random variables process, then k(H) > 0 for all 
sets Æ in Z, with more than one point. 


Proor: If P is ergodic, then A® = «BE by Proposition 9-55. Hence 
AZ > a; > 0 for jeE. Since Ê is also ergodic, A; > 0 for j # 0. 
Therefore, if Æ contains a state other than 0, then k(E) > 0 by 
Proposition 9-89. 

If P is a symmetric sums of independent random variables process, 
then K; = 47N,, = 4 for j # i, by Corollary 9-29. If JZ > 0, then 
k(E) = K; > 0 for any other state je E. Ifl = 0, then IF > 0 for 
some j € E, and k(E) = Kol? > 0. 


Proposition 9-91: For any set E in Zo, K,~' exists if and only if 
k(E) > 0. If k(E) > 0, then 


1 


Ke = (P* + am 


-1 
LENE — 1) 
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Proor: If k(E) = 0, then AEK,; = 0 and K,~! does not exist. If 
k(#) > 0, then by Corollary 9-86, 


1 


l 
k(E) k(E) 


ENE — 1) -I-14 Ks( aap m) 


= I — 142 + 1A£ by Proposition 9-87 
= TI, 


K,(P* + 


From Proposition 9-91 we obtain the following expression for P”? in 
terms of K,~} and ag. 
Corollary 9-92: If E is a set in Z, with k(E) > 0, then 


PE=I+K; `- (Ke )(esKr t) 


rK 11 
Proor: By Proposition 9-91, 
1 
E -1 _ E)E 
PE = I + Kg rE PE 


Multiplying through on the right by 1 gives 


1 
-1q4 ~ E 
K = gap 


If we multiply both sides by «g, we get 


MO) = RT 
Hence 
jie ee 
E aK "1 
By duality, 
AE as aK 
Em apKy T 


and substitution for k(H)~ +122 gives the desired result. 


We now begin a series of lemmas which lead to the fact that k(£) is 
a Choquet capacity in the sense of Definition 8-38. 


Lemma 9-93: If Z is a finite set, then 
lim (BEN™ — N™) = Fda — FN. 
n 
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Proor: By Proposition 6-17, 
BEN® _ NM = EN pti _ EN. 
Now 
dual (EN P™+1) = Pn+1 EÑ +1} by Proposition 9-47, 
so that 
EN P"+1 — Eda. 
Lemma 9-94: If Æ is a finite set, then 
BEK = K + FN — "da 
and 


pa 
g( G) =K+N-1>. 


ProoF: Each row of B¥ — I is a weak charge. By Theorem 9-84 
and Lemma 9-93, 
(BF — I)K = —lim (BE — I)N™ = EN — "da. 
n 


The second result is dual. 


Lemma 9-95: If Æ is a finite set, then 


AEK = k(E)a + ®v 
and 


KIE = k(E)\ + Fd. 


ProoF: Multiply the equation of Lemma 9-94 on the right by J to 
obtain 


BE(KIF) = KIE + ENIF — Fd(al®) 
= KIE — Fd. 
But (KI), = K,lE, which by Proposition 9-87 equals k(#)1. Hence 
KE = k(E) + 4d. 
The other result is dual. 


Lemma 9-96: If E, F, and L are finite sets with E and F both 
contained in L, then 


[k(E) — k(F)] = AX(Fd — Fd) = (Fv — Bye, 
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Proor: Multiplying the equation of Lemma 9-95 through by A“, we 
obtain 
AKI = k(E) + à! Fd 
and similarly 
AKIF = k(F) + à Fd. 


Subtracting, we see that it is sufficient to prove that AKIF = AKI. 


But 
MKE = (AK) = k(L)a,lE = k(L) 


since the support of IF is in L. Similarly, AKIF = k(L). The other 
equality is dual. 


Lemma 9-97: For any sets A,, Az,..., A, which have at least one 
point in common, 


Baa OA IN y > 2 4N; m AVA, N 
s 


+ AVAAN oo + (Hl) tt 442 v4, 


Proor: The left side is the mean number of times in state j, starting 
in i, before the intersection of sets is reached. We shall show that the 
right side is the mean number of times in j before all of the sets have 
been reached. The former is clearly at least as large as the latter. 
Let n,(w) be the number of times on w that the process is in j before 
all sets are entered. On the path w let S, be the set of times at which 
the process is in j before A, is entered. Then n,(w) is the cardinal 
number of S; US, U- - -U S, and equals 


> n(S,) z > n(S, N Si) SADRE 
s s#t 


where n(A) is the cardinal number of A. Since n(S, N---S,) is the 
number of times in j before A, U---U A, is entered, the result follows. 


Proposition 9-98: Capacity is a monotone increasing set function, 
and for any sets A,, Ag,..., A, in Lo, 


k(A,O---0A,) < > WAY) — > (A, U A) 
i t#7 


+ > kA UAUA) 


i#j#k 


+ (-1)'t1k(A, U---U A,). 
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Proor: We first prove monotonicity. Let A C B; then 


AN > FN. 
Thus 
P"4AN > P”*8BN. 
As n —> 0, 
Ay > By, 
Therefore 
(4v — Bv) > 0. 


If L contains A and B, then by Lemma 9-96, k(B) — k(A) = 0 or 
k(B) = k(A). 


Therefore k is monotone. The other inequality of the proposition 
follows in exactly the same manner starting from the result of Lemma 
9-97. 


In the next section we shall prove potential principles for sets of 
positive capacity. It is therefore particularly annoying when all sets 
in a chain have capacity zero, and we treat this case now. 


Definition 9-99: A normal chain is degenerate if, for every choice of 
the reference state 0, k(E) = 0 for all finite sets Æ. 


Lemma 9-100: If, for a single reference point 0, k(E) = 0 for all 
Ee, and for all one-point sets, then the chain is degenerate. 


Proor: If 0’ is any other reference point, then 
k(E) = k'(E) + k({0'}) = k'(E). 


Hence it suffices to show that k(E) = 0. If Oc H, k(E)= 0 by 
hypothesis. If 0¢ Æ, let 0’ be any point of E, and 0 = k’({0’}) < 
k'(E) = k(E) < k(E U {0}) = 0. Both inequalities follow by the 
monotonicity proved in Proposition 9-98. 


Lemma 9-101: A normal chain is degenerate if and only if for every 
pair of states i and j either ‘A; = 0 and ‘A, = 1 or ‘A; = l and ÎÌ; = 0. 


Proor: If the chain is degenerate, then k({i, j}) = 0 if i is chosen as 
the reference point. Hence ‘A; = 0 or ‘A; = 0 by Proposition 9-89. 
By symmetry ’A, = 0 or 7A, = 0, and hence ‘A, = 1 or fÀ; = 1. 
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Conversely, if °A; = 0 or %X, = 0 for all states j, then k(E) = 0 for all 
E e Ly, (by Proposition 9-89 since AF < °A;). And 


0 
HH = È (Go; = Coy) = Z 


wa (hs = Ao). 
But °A, and 7X, are either both 0 or both 1. Hence k({j}) = 0. Thus 
the converse follows from Lemma 9-100. 


The basic example and its reverse are degenerate when null. It can 
be shown that degenerate chains are all of a type similar to the basic 
example. (See the problems.) 


Proposition 9-102: If P is not degenerate, then there are two states 0 
and 1 such that, with 0 as reference point, k(E) > 0 for all sets 
containing both states. 


Proor: Choose an arbitrary point as the reference state 0. Then, by 
Lemma 9-100, there is either a one-point set or a set in S, which has 
non-zero capacity. 

If k({1}) > 0 for some state, then we may choose 0 and 1 as indicated, 
since k(E) => k({1}) > 0 if le #. If k({1}) < 0 for some state, then 
use 1 as a new reference point, and k’({0}) = —k({1}) > 0. Hence we 
simply interchange the roles of 0 and 1. 

Otherwise k(#) 4 0 for some Hef). Then k(E) > 0. Let E bea 
set in Z, with positive capacity and containing as few states as 
possible, let 1 be a state of E other than 0, and let F = E — {1}. By 
minimality, k(F) = 0. Thus, by Proposition 9-98, 


0 = k({0}) = k(F A {0, 1}) < k(F) + k({0, 1}) — k(E), 
and 


k({0, 1}) = k(E) > 0. 
Hence any set containing 0 and 1 also has positive capacity. 
We conclude this section by applying our results to ergodic chains. 
Proposition 9-103: If P is ergodic, Sper AE Myo = k(E). 
Proor: Since aok(#) = (AFC), we have k(E) = (A€ M)o. 
This proposition enables us to give an interpretation to k(#). Since 


àE = «B®, k(E) = «(B®M), = M,[time to reach 0 after E is entered] = 
M,[t) — tz]. This function is monotone in Æ since ty — tg is monotone 
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on each path. The capacity inequalities follow from the fact that the 
time to enter all of n sets is no greater than the time to enter the inter- 
section; hence the time to reach 0 from the intersection is no greater 
than the time to reach 0 after all n sets are entered. We also see why 
k(E) > 0, if E has more than one state. 


Corollary 9-104: (AEM), = (8M); for all j € E. 
Proor: Choose j € E as reference point. Then 
(AEM), = k(E) = kB) = ÎE M). 


Proposition 9-105: In an infinite ergodic chain K + k1a has negative 
entries for each k. 


PROOF: 
(K + k1a);; = Go; = Co; + ka; 
= ( Co; eae Go; ¥ kay 
a; 
Hence 
lim ee = —œ by Proposition 9-63. 
j j 


Proposition 9-106: Let {Z,} be an increasing sequence of finite sets 
with union the set of all states S. Then lim,.... k(H,) < +œ if and 
only if the chain is strong ergodic. In a strong ergodic chain, k(E) = 
Myo — Myx, and hence M,e = My. 


ProoF: k(E) = M,{t, — tg] and therefore 
> “Mig < KE) < Mao. 
ieE 
Hence as E, > S, k(#,,) > M zo, which is finite only in a strong ergodic 
chain. Then 
Muo = Mor = k(E) = k(E) = Mo paj M ge: 


Hence M,, = Mur, since aM = oM. 
For strong ergodic chains the concept of capacity can be extended to 


infinite sets, and Proposition 9-103, Corollary 9-104, and Proposition 
9-106 hold for all sets Æ. 
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9. Potential principles 


In this section we shall assume that P is a nondegenerate normal 
chain. We shall assume that 0 and 1 are chosen according to Prop- 
osition 9-102, and we denote the collection of all finite sets containing 
both 0 and 1 as Z. Thus capacity is uniquely defined, k(E) > 0 for 
E e and K,~} exists by Proposition 9-91. 


Definition 9-107: Let f be any function whose non-zero components 
are on a finite set E. Then f is the charge of the generalized potential 
g = — Kf, with support E and total charge af. Iff > 0, then g is a 
pure potential. Let u be any signed measure whose non-zero com- 
ponents are on Æ. Then p is the charge of the generalized potential 
measure v = — pK, with support E and total charge u1. 


In this section g and f will be used only for generalized potentials and 
their charges. 


Proposition 9-108: g determines f, and if the support is in E e Z, then 
gg determines g. 


Proor: Since He Z, k(E) > 0, K,~1 exists, and 


fe = —Krtge g= ~x") 
0 


To see that g determines f even if the support F fails to contain 0 or 1, 
we simply let E = F u {0, 1}. 


Proposition 9-109: For any g with support in E, Ee JẸ, 
Mg = —(af)k(E) 
g = BYg — (af) "d 
(I — P¥)gp = fe — (of )lE- 

Proor: Since AEK, = k(E)a, by Proposition 9-87, —A*g = (af )k(E). 
Since BEK = K + PN — "da by Lemma 9-94, — B¥g = —g — ¥d(af). 
Since (I — P®)(—K,) = I — lap by Corollary 9-86, 

(I — PEs = fe — lilaf). 


Proposition 9-110: If g is a pure potential and Æ is any set in Y, 
not necessarily the support of g, then 
— Mg > (af)k(E) 
g = Bg — (af) *d 
(I — P*)ge = fe — (af yli. 
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PRoor: 

-Eg = Kf 
= (k(E)a + ®v)f by Lemma 9-95 
> k(E)(cf) since v > 0 and f > 0. 

- B¥g = B®Kf 
= Kf + ®Nf — (af)®d by Lemma 9-94 
> Kf — (af) d 

or 


Bg < g + (af) *d. 
By Proposition 9-85, 
EN 
W*(—K) = ( ) — Fa. 
0 


Hence _ 
(I — P*)g, = *Nf — (af le 


fe — (af lis 


IV 


since EÑ; = 1 forie Hand f > 0. 
The recurrent Maximum Principle is the following corollary. 


Corollary 9-111: A generalized potential of non-negative total charge 
assumes its maximum on the support. 


Proor: By Proposition 9-109, g = B¥g — (af)d. But «f and Ed 
are both non-negative. Hence g < B¥g, and g; < mMaXyer Jr- 


Lemma 9-112: If g’ is a generalized potential with support in E € Z 
and if g is any pure potential such that gx < gp, then af’ > af. 
Proor: By hypothesis and by Propositions 9-109 and 9-110, 
0 > A*(g’ — g) = (—af')k(E) + («f)k(E). 
From this it follows that af’ > af, since k(Z) > 0. 


Definition 9-113: An equilibrium potential for E is a potential with 
support in Æ which has total charge one and has constant values on Æ. 


This definition of equilibrium potential agrees with the one for 
transient chains (Definition 8-27) only up to a constant multiple, but it 
will be a more convenient definition for recurrent chains. We shall 
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discuss the effect of the change after proving that equilibrium potentials 
exist. 


Theorem 9-114: Any set E in Z has a unique equilibrium potential. 
The equilibrium charge is {F and the value of the equilibrium potential 
on the set is —k(#). For j not in Æ it has the value —k(H) — ¥d,. 


Proor: Lemma 9-95, together with the fact that alë = 1, shows that 
— KI? is an equilibrium potential and has the values specified in the 
theorem. 

Conversely, if f is an equilibrium charge, then by Proposition 9-109, 
fe = (I — P®)g, + (afk = (I — P*¥)g, + UE. Since gẹ is constant 
and P¥1 = 1, (I — P¥)g, = 0. Thus fp = lẸ, and the equilibrium 
potential is unique. 


If we renormalize the equilibrium potential for Ẹ so that its value is 
one on £, then its total charge becomes —1/k(H). Thus if we were 
following the definitions of transient potential theory, we would define 
— 1/k(E) to be the capacity of E. The set function — 1/k(£) is a mono- 
tone function, as is k(H), but it does not have as nice a probabilistic 
interpretation as our choice. 

We shall now prove the recurrent Principle of Balayage. 


Theorem 9-115: If g is a pure potential and if E e Z then there is a 
unique pure potential g’ with support in Æ such that 


(3) af” > af, 
(4) af’ = —k(E)~ "g, 


and this unique potential is B¥g + (àFg/k(E)) Fd. 


Proor: Since g’ is determined by its values on Æ, there is only one 
potential satisfying (1), and its charge is 


fe = —Ke`'ge 
i Mg e JE 
= (I — P¥)g, — KE) l by Proposition 9-91 
= (Z — P*)gx + (af le 
> fe 


The last two steps are by Proposition 9-110. Thus g'is pure. Hence 
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af’ > af, by Lemma 9-112, and we have (3). Result (4) follows by 
multiplying the second equation above by «. Furthermore 


g = B*g' — (af’)¥d by Proposition 9-109 
Bg — (af)®d by (1) and (3) 
<9 by Proposition 9-110. 


IA 


Hence we have (2). The formula for g’ follows from (4) and the second 
part of Proposition 9-109. 


The potential g’ is, as in the transient case, referred to as the balayage 
potential of g on EF. 
Next, we prove the recurrent Principle of Domination. 


Theorem 9-116: If g and g’ are pure potentials, if g’ has its support in 
Ee, andifg; = gp, theng = g’. 


PROOF: 
g = Bg — (af)*d by Proposition 9-110 
= B¥g’ — (af’)*d by hypothesis and by Lemma 9-112 
=g by Proposition 9-109. 


Corollary 9-117: The balayage potential of g on Æ is the infimum of 
all pure potentials that dominate g on E. 


Proor: If g is a pure potential and fẹ = gg, then fg = gp; hence 
g 2 g' by domination. 


Proposition 9-118: If g is a pure potential with support in E € Z 
and total charge 1, then 


min g; < —k(E) < max gj. 
ieE ieE 


ProoF: Assume that the first inequality is false. Then —(Kf),; > 
—k(#) for allie E. Hence we may choose ac > 1 such that 


—K(cf) => -k(E)ņ = -KE 


on E. Thus c = c(af) < alë = 1, by Lemma 9-112, which is a con- 
tradiction. The other inequality is proved similarly. 


We retain the same definition of energy as in transient theory. 
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Definition 9-119: If g = — Kf is a potential, the energy of g, denoted 
I(g), is 
I(g) = pg = vf. 


Lemma 9-120: For any potential g with support in a set He %, 
I(g) = ve(I — PE)gg + (af)(A¥g). If P = PÊ, then 
I(g) = vell — Pg — k(E)(af)?. 


Proor: By Proposition 9-109, 
(I — P¥)gc = fe — (af iE. 
Hence 


vell — P®)gx = I(g) — (af)(vel#) = I(g) — (af)(A*9). 
If P = Ê, then \¥g = A¥g = —(af)k(E) by Proposition 9-109. 


Lemma 9-121: vg(Z — P*)ge = 4 Si jen 4 PE(g; — 9;)? = 0, and the 
value is 0 only if g is constant on Æ. 


Proor: The proof proceeds as in Lemma 8-54, but P¥1 = 1 and 
a,P® = «g; hence m, = 0 and z, = 0. Let F be the subset of all 
states of E on which g = g). If F # E, there are states ie F and 
je F such that PE > 0, since the states of E communicate. Then 


vell — P*)ge = aPC — g)? > 0. 


Lemma 9-122: If He Z, the energy of the equilibrium potential of 
E is —k(E). 


Proor: Since g = — KI is constant on E, (I — P¥)g, = 0. Since 
af = 1, ÀEg = ÀE(—k(E)4) = —k(E). Hence by Lemma 9-120, the 
result follows. 


Theorem 9-123: If P = Ê and Ee Y, then among all potentials 
with support in Æ and total charge 1 the equilibrium potential alone 
minimizes energy. 


Proor: If g has support in E and af = 1, I(g) > —k(E), by Lemmas 
9-120 and 9-121. Equality holds only if g is constant on Æ, in which 
case g is the equilibrium potential. 


10. A model for potential theory 


By an electric circuit we shall mean a denumerable number of 
terminals, some of which are connected by wires. The wire connecting 
terminals 1 and j has resistance r;; and conductance c; = l/r. If there 
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is no connection between i and j, we let c; = 0; also we define c; = 0. 
We shall assume that: 


(1) For each 1, >, c < œ. This condition is satisfied, for example, 
if each terminal is connected to only finitely many other 
terminals. 

(2) The circuit is connected. 


From physics we have the following two laws. In the present context 
the first one may be taken as a definition of current in terms of voltage: 


(1) (Ohm) If is at voltage v; and j is at voltage v,, the current 
flowing from j to t is (v; — v,)c;;. 

(2) (Kirchhoff) The sum of all currents flowing into a given terminal 
is 0. 

If an outside source is attached to a certain terminal, then the 
Kirchhoff Law (2) does not apply unless account is taken of current 
flowing from the outside source. But for all terminals ¢ which are not 
attached to any outside sources, Ohm’s Law and Kirchhoff’s Law 
imply: 

(3) de (Vi — U)ene = 0. 

If a finite set E of terminals is kept at prescribed non-zero voltages by 
an outside source and if there is a finite set F with E C F such that all 
points in the set F are grounded (kept at 0 voltage), then we shall call 
the problem of finding voltages at the points i of F — E in such a way 
that (3) holds a standard voltage problem. Note that for finite circuits 
the voltages may be prescribed at an arbitrary subset Æ of terminals. 


Definition 9-124: A Markov chain P with P1 = 1 is said to represent 
some given electric circuit if each state corresponds to a terminal and 
if any standard voltage problem can be solved in such a way that the 
voltage vector is P-regular at points of F — E. 


It follows from Theorem 8-41 that if P represents a circuit, then the 
solution v to a standard voltage problem is unique and satisfies 


v = BEvFy, 
Thus the voltage at a point i of F — E can be interpreted as the 
expected final payment if the chain is started at i and stopped at 


EU F and if a payment of v, is received if the process reaches state j 
of E. 


Proposition 9-125: For any electric circuit there is a unique Markov 
chain P such that P, = 0 and P represents the circuit. 
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ProorF: We first prove uniqueness; let P represent the circuit. Let i 
and j be any two distinct states, and let E = {j} and F = {i,j}. Put 
a unit voltage at j and ground F. Then by (3), 


> (Vi — U)ey, = 0. 
k 
Since v, = 0 except when k is i or j and since c, = 0, we have 


vi > Cik = Vil + Vli = Cz 
k 


Hence 


Cis 
Vi = u . 
Cik 

T 
(The denominator is not zero since the circuit is connected.) Now 
since P represents the circuit, 


v, = (Pv); = ` Piaty = Piwi + Piws 
k 


Since P = 0 and since v; = 1, we have 


v = Pi. 
Therefore 
Cry 
Py = ts 
> Cik 
k 


and P is unique. 
Next we prove existence. Let the circuit be given, and define 


ea, 


Hence P is a transition matrix with P;, = 0 and Pi = 1. Thus let E 
and F be finite sets with Æ C F, let v,; be specified, and let vp = 0. 
We are to show the standard voltage problem has a solution. Define 
v by 

VE 


— PREvF 
v = BEY UR 


0 
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Then v is regular on F — Æ, or 
v — > Pid, = [I — Pv], = 0 
fori in F — E. Since P1 = 1, 
> (vi — Ve)Pix = 0, 
or : 


= 0. 


E Cik 
> (vi — Up) > es 
m 


Thus 


> (v; — Ve)Gik = 9, 
k 


and v is a solution to the standard voltage problem. 


Corollary 9-126: Every standard voltage problem has a solution, and 
that solution is unique. 


Proor: Existence follows from Proposition 9-125 and uniqueness 
follows from Theorem 8-41. 


Shortly we shall show exactly how general the class of chains that 
represent circuits is. But first we shall exhibit the connection between 
the currents and voltages of this section and the charges and potentials 
of Markov chain potential theory. In so doing, what we will be 
showing is that electric circuits provide a model for the discrete 
potential theory associated with the class of chains that represent 
circuits. 

In physics, current is the time rate at which charge flows past a 
point—that is, the derivative of charge with respect to time. Markov 
chains, however, have a time scale that is discrete and not continuous, 
and the proper analog of the time rate at which charge flows past a 
point is the amount of charge that moves past the point in unit time. 
With discrete time the charge moves to some point, stays for unit 
time, and then moves to another point. Hence the magnitude of the 
current at a point is equal to the magnitude of the accumulated charge 
at that point. 

Now in a standard voltage problem, current flows in and out of the 
circuit through the terminals which are attached to the outside source. 
The above considerations lead us to define the charge at terminal 7 to 
be the current u; which flows into the circuit; u; is taken to be negative 
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if the current flows out. By Kirchhoff’s Law (2), the charge will be 
zero on the set F — E. For iin E U F the charge is given by 


B= > (Vi — Vx)Cix- 
k 


Before we can connect u and v in terms of the representing chain, we 
need one preliminary result. 


Proposition 9-127: Let P be a Markov chain which represents an 
electric circuit with conductances c;;. Then the row vector « defined by 


a= > Cik 
k 
is P-regular, and the «-dual of P is P. 


Proor: We have 


and hence 
>, aP; TE > Pi aad 


Therefore « is regular. Since P, = (a,P,,)/«;, the «-dual of P is P. 


In terms of Proposition 9-127 we can transform the equation u; = 
Dx (V; — Y,)Cy, as follows: 


ki = Vj, — 2 aP itk 
= va — 2 Vee P ei 

{(dual v)(I — P)],. 

Hence u = (dual v)(I — P). Since P = Ê, 


dual u = (I — Py. 


Let fi = y,/o;, that is, f is the dual of the vector of charges at the various 
terminals. Then 
f= (I — Py. 


We note that all pairs of states communicate in P since the circuit is 
connected; hence P is either transient or recurrent. But v has only 
finitely many non-zero entries, so that av is finite. It follows that if P 


308 Recurrent potential theory 


is transient or null, then v is a potential in the Markov chain sense. 
And if P is ergodic, then v — (af)1 is a potential. in any case, f is the 
charge. 

Conversely, if, in a chain which represents a circuit, g is a potential 
vanishing outside a finite set F with a charge f such that f has total 
charge 0 and f has support in E U F, where E C F, then g solves the 
standard voltage problem for Æ and F with the specified values g; on 
E. It is in this sense that electric circuits form a model for potential 
theory. 

We turn to the problem of classifying all Markov chains which 
represent circuits. A chain P is said to be «-reversible if Ê, the 
a-dual of P, equals P. 


Proposition 9-128: A Markov chain with P, = 0 represents a circuit 
if and only if its states communicate and it has a positive regular 
measure «œ with respect to which it is «-reversible. 


Proor: If P represents a circuit, then it has a regular measure « and 
is a-reversible by Proposition 9-127. Its states communicate since the 
circuit is connected. 

Conversely, suppose that P is a transition matrix with the stated 
properties. Introduce the electric circuit with the states of P as 
terminals and with c,; = a,P,,;. The circuit is well defined since 

Ca = Pa = 0 
and since 
Cy = Py = oP yy = Cy. 


Since the states communicate, the circuit is connected. To see that P 
represents the circuit, we note that 


a= > “Pun = >, Cie 
k k 


Thus 
c Ci; 
Pp, Shs SL, 
ij 
Q% > Cik 
k 


Finally we consider the problem of when the chain representing a 
circuit is recurrent and when it is transient. 


Lemma 9-129: Let P be a chain which represents an electric circuit. 
Let a unit voltage be put at 0, let F be a finite set containing 0, and let 
F be kept at voltage 0. The charge at the terminal 0 is 


os of 
Ho = &° Hos- 


9-130 A model for potential theory 309 


PROOF: 
Hop = 2 Pox Hye 
= > Poll — *Hyo) since *P is absorbing 
k 
= J Pal - BEY?) 
k 


= > Pox(¥o — Vx) 


l 
= a 2 Cog(Vo — Ve) 


= Ho, 
&o 


Lemma 9-130: In any Markov chain a state 0 is recurrent if and only 
if °H),—> 0 for some (or every) increasing sequence of finite sets F 
with union the set of all states. 


Proor: If 0 is transient, then there is a positive probability 1 — Hy, 
that the chain never returns to 0. Hence 


ys > 1 — Ay > 0 


for every finite set F, and °F „p cannot approach 0. 
_ Conversely, let 0 be recurrent. Choose N sufficiently large that 
H > l — e. Choose 6 close enough to 1 so that 


l—~-e<S <1. 


Then construct an increasing sequence of finite sets Ay, A,,..., Ay 
such that A, = {0} and such that the probability of stepping from any 
state of A, to a state of A,,, is greater than 6. Let F be any finite 
set containing Ay. The probability that the process started at 0 is, 
for every n < N, in A, after n steps is greater than ô". Hence 


HW < 1 — ôN <e. 


Since 
l — e< H < FAW) + “HY, 
we have 
FH > FAN) > 1 — 2e. 
But 


Hog + Hop = 1, 
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so that 
se < Qe 


for all F containing Ay. 


Proposition 9-131: Let P be a chain which represents an electric 
circuit. Then the following is a necessary and sufficient condition for 
P to be recurrent: If terminal 0 is kept at a unit voltage, if all terminals 
outside a finite set F are grounded, and if F is allowed to increase to S, 
then the current at terminal 0 tends to zero. Furthermore a necessary 
and sufficient condition for P to be ergodic is that >; ; Gij < ©. 


Proor: The first assertion follows directly from Lemmas 9-129 and 
9-130. The second assertion follows from the fact that P is ergodic if 
and only if a1 < œ, where a, = >, cy. 


11. A nonnormal chain and other examples 


We shall show by examples in this section that all three of the 
following conjectures are false: 


(1) Small sets and ergodic sets are identical concepts, or at least one 
of the notions includes the other. 

(2) All null chains are normal. 

(3) The existence of either C or G implies the existence of both. 


First, we settle the independence of the notions of small sets and 
ergodic sets. We saw in Section 6 an example of an ergodic set which 
is not small, and we shall now produce a small set which is not ergodic. 

Let P be a chain with states the non-negative integers and with 
transition probabilities 


Po = Di = Pio for i > 0 
Pa = q = 1 — pe 


All other entries of P are 0. We impose no requirements on the p; 
yet except that p; > 0 and > p; = 1. It is clear that Hao = 1 and 
hence P is recurrent. Since P = P7, a = 17 is regular and P is null. 
Only finite sets are ergodic, and thus it is sufficient to exhibit an infinite 
small set. For any set # containing 0, 


AF = lim >, PRB = lim PẸ =0 for j > 0, 
n k n 
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whereas 
A = lim 2 P% BE, 
nig PR + lim > PQ 
n keé 
1 — lim POY. 


Thus any set containing 0 such that PY} —> 0 satisfies AF = 1 and is 
therefore a small set. Let 
pi = 47k for 2k — 1 <i < 2kt1 — Q, 


The p, assume the constant value 4~* on a block of length 2". Thus 


Sp = > 2k.4-k = J, 
k=1 


Let E consist of 0 and one state, such as 2", from each block, and label 
the representative of the kth block in Æ as e,. Then 


m 
PR = PR + + PR Pes 


where Ty = fey, €v+1, €v+2,---}. We can form 2% disjoint sets like 
T y each differing from it in every representative selected from the Nth 
block on. By symmetry P}, = P, for all such sets Th. Hence 
P§?, < 1/2” for all n. Since PẸ — 0 in a null chain, 

1 


lim aup PO < hk 


and we must have PR — 0. 


Second, we shall construct a nonnormal null chain. We shall use 
generating functions and require the following facts: 


(1) If F(t) = 5, fot and G(t) = S, gt", then 


F-a o=>(> fied) 


That is, the coefficients of the series for F(t)-G(¢) are the convolutions 
È SeIn-r 

(2) The Abel sum of the series >, f, is lim,;,,- F(t). If the series 
> fa converges, then its Abel sum exists and has the same value 
(Proposition 1-62). 


Let P be any recurrent chain; we begin by deriving a necessary and 
sufficient condition for the series >, (PY? — PY?) defining Cy, to be 
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Abel summable. Let E = {0, 1} and define generating functions as 
follows: 


Agolt) = >, Fg, Aoi (t) = > OF oper 
n 


A,o(t) = > 1 POE, A(t) = >` o Foge 
n n 
Pot) = > Ppr 
H(t) = Ago(t) + Aoi(t) = > Fogir 


n 


H(t) = Ayo(t) + 4t) = > Foye 


Q(t) = > (PR — PR = Palt) — Polt) 


1 — H(t) 


OSTER 


We note that the series defining Cy, is Abel summable if and only if 
lim,.,- Q(t) exists. We shall prove that this limit exists if and only if 
lim,.,- R(t) exists. We have 


(n) — 1 k -k) 0 Rk -k 
PR = X CERP O + OFRPH-D) 
k 


or 
Polt) = Aoo(t)Porlt) + Aor(t)Pir(é). 
Also 
PR = bo + D CFRPR O + FRPR) 
k 
or 


Pit) = 1 + Ajo(t)Poilt) + 411 (t)P11(6). 
Solving these equations for P,,(¢) and P(t), we find 


Q(t) = P,,(t) ad Polt) 
l — Ao(t) r= Aoi(t) 
(1 — Aot) — Ars(t)) — Aor(t)Aro(¢) 
1 


(1 — Ago(t)) R(t) + Aro(t) 
Since Ago(1) = Hoo < 1 and since Aj,9(1) = H o > 0, lim,_,- Q(t) 
exists if and only if lim,„;- R(t) exists. 
The example of a nonnormal chain will be like the earlier example in 
this section, only “doubled.” The states are 


0, di, Mg, Agy... 
1, by, bg, b3,..-; 
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and if p, > 0, p; > 0, È p; = 1, and > p; = 1, then the transition 
probabilities are 
Poa, = pi = Pai 


=1-p, 

Pry, =p = Pro 

Poo = 4 = 1 — pi 
All other entries are 0. We see easily that Hy, = 1 and that a = 17 
is regular, hence the chain is recurrent and null. For Æ = {0, 1}, 


FQ = FY => pap, forn > 2. 
i 


a) 
£ 
£ 

l 
Ba 

| 


Therefore 
H(t) = >p? > qi 7" = pit i 
i n=2 T 1l— qt 
Similarly, 
Dae 
and we have defined R(t) by 
_ 1-H) 
RAST Ha), 


We again choose the p;’s in blocks as follows. Let {n,,} and {n;,} be two 
rapidly increasing sequences (with magnitude specified later) such that 
Ny < Ny < Ng Let there be n, consecutive p,’s equal to «, = 
1/(2*n,), and let there be n, consecutive (p;)’s equal to ep = 1/(2*n;,). 
The remainder of the proof consists in showing that for suitably chosen 
{m} and {n} 
lim R(1 — e) # lim R(1 — «,). 
We shall only sketch the argument. 
We begin by estimating R(1 — e,„) for large n. We have 


pr(l — en)? P? 
H 1 — = COC 
of én) ale ae T Pi + En 


eo foe} 
2 Ek + En Le k ek En 


For k = n, ep|(€k + €n) = 4. Choose the sequence n,, so that e, is 
negligible compared to «, when k < n. Then 
n-i 
Ho(l — en) ~ > 275.1 + 277.271 = 1 — 277+1 4 277-1, 
k=1 


Similarly, 
œ f n-1 
H,(1 — en) ~ > 2-# 7H wD 2a 1 — 2-81, 
k 
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which differs from H)(1 — e,) if €, is chosen to be negligible compared 
with e, but much bigger than e;, _;. 
Q-nt+1 


4 
RA = En) ~ = 3 


Zari L gaT 


Ift = 1 — e, and n is large we obtain, similarly, 


€ n 
DA 2k*=1—-2 
Hen e 


Hol = e) ~ > 2-* 


k=1 Ek 
and 
Ek 


Hl -)~ > 2-* 


n-1 
a ™~ 2-7k + Q-7.Q-1 
1 Ek ag En 2 


k= 
= ] — 277+1 + Q-n-1. 


The asymmetry in H,(1 — «,) and H,(1 — e«,) arises because the 
condition ny < nm, < 4+, is not symmetric. We thus find 
Q-nt+1 — Q-n-1 3 
cs a ea 
Therefore, lim,,,- R(t) does not exist and Cy, cannot exist. In 
particular, P is not normal. 

This example has the property that neither C nor G exists. The 
reverse process has transition probabilities Poy, = Pio = Ppi = Ppa 
and Pix =p; = Pao: Thus Ê is the same as P except that the roles 
of 0 and 1 have been interchanged. The above argument therefore 
shows that Ĉjo does not exist, and since Ĉio = Go: if either exists, G 
cannot exist. 

Not even reversibility (P = Ê) is a sufficient condition for a null 
chain to be normal. A slight modification of the above example 
provides a counterexample. Let the states be as before, and define 
p; and p; as above. Let 

Poa, = Pao = 4P; Pry, = Py = 3D, 


Oa; 


Paii = 1l — 3p; Pan = 1 - 3D, 
and 


= = d 
Po = Pio = 3. 


Set all other entries of P equal to 0. Then P = P’, so that a = 17 
and Ê = IPTI = P. But the same kind of computation as for the 
preceding example shows that C,, does not exist. Thus we see that 
even a symmetric P need not be normal. 
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Third, we show that the existence of one of C and G does not imply 
the existence of the other. Again we modify the first example of a 
nonnormal chain. Let 


Poa, = Pir Pia = fi Pry, = pi, and Poio =q; 
as before, and use the same p’s. But set 
Pao = Pai = 3p, and Pao = Ppa = 3p. 


It is clear that F%,,, and F%™,,, are the same as before, so that 
lim,.,- R(t) does not exist and Cy, does not exist. On the other 
hand, the reverse chain is no longer of the same type and the argument 
for the nonexistence of G fails. In fact, 


oA, = tim [Ps + > Poe, "Hai + > PS K 
i i 


; 1 
= lim | Pp + 55 (Ps + P) 


NI = 


Hence Gy, = 4 °N,, exists. Moreover, if x and y are any two states 
other than 0 and 1, 

“Ay = Hov + Hi 
Therefore all of G exists. The reverse chain is an example in which C 
exists but G does not. 


12. Two-dimensional symmetric random walk 


The purpose of this section is to show how the results of Section 8 
may be used to work out some of the matrices associated with the 
two-dimensional symmetric random walk. 

First we shall find the operator K. In this example, K = C = Gand 


Keay = Kæ- r.y -v0,0 
Hence it suffices to compute one column of K. We let 


k(x, y) = Kx.v).0,07 


A row of I — P is a charge whose potential is the corresponding row 
of I. For this process a row of I — P has only finitely many non-zero 
entries and is therefore a weak charge. By Theorem 9-84, (I — P)K = 
—I. Thus k(x, y) is the average of its values at the four neighboring 
points, except that at the origin the average is one larger. We know 
also that k(0,0) = 0. The high degree of symmetry of the chain 
implies that 

k(x, y) = k(— zx, y) = ky, x). 
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In particular, the values at the four points neighboring the origin must 
be the same. Since their average is one, k(0, 1) = 1. 
We shall need one more result. It can be shown (see Spitzer [1964]) 
that 
4/1 1 1 
k(x, x) -f(i+3 ++ zao) for x > 0. 
This identity, together with the properties above, determines the 
function k. 

In fact, it suffices to restrict attention to 0 < x < y. We know that 
k(0, 0) = 0, k(0, 1) = 1, and k(1, 1) = 4/m. If we know the values of 
k(x, y) up to a given yọ for all x such that 0 < x < y, then we can find 
the values of k(x, Yọ + 1) for O < x < yo + 1. The averaging and 
symmetry properties give 


k(0, Yo + 1) = 4k(0, Yo) — 2k(1, Yo) — k(0, Yo — 1) 

K(x, Yo + 1) = 4k(x, Yo) — k(x + 1, Yo) — k(x — 1, Yo) — k(x, Yo — 1) 
for 0 < £ < Yo 

K(Yos Yo + 1) = 2k(Yo, Yo) — klYo — 1, Yo). 


And k(yYyo + 1, Yọ + 1) is given by the identity for k(x, x). 

The equations above thus are recursion equations for k(x, y). 
Actually these equations are so simple that we apparently have a very 
rapid method of computing K, for large finite sets Æ. Unfortunately 
the recursion is highly sensitive to rounding errors. 

In Table 9-1 we give values of k(x, y) for a wedge in the plane. The 
computations were carried out to 9-place accuracy, but by y = 10 the 
effect of rounding errors was noticeable. Any larger table would 
require much more accuracy of computation. 


TABLE 9-1. k(x, y) FOR A WEDGE-SHAPED REGION 


2.429 2.431 2.444 2.461 2.486 2.514 2.546 2.579 2.614 2.649 
2.353 2.357 2.372 2.395 2.424 2.459 2.496 2.535 2.574 
2.267 2.274 2.293 2.322 2.359 2.400 2.444 2.489 

2.168 2.178 2.203 2.241 2.288 2.339 2.391 

2.052 2.065 2.101 2.153 2.213 2.276 

1.908 1.930 1.984 2.056 2.134 

1.721 1.762 1.849 1.952 

1.454 1.546 1.698 

1.000 1.273 


eeeeeeeeee 
ruled ted te wet tt 
CENWRhARDIDS 
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If we wish to find k(Z), A¥, and PF for a finite set of points E, we first 
construct K, and then compute the inverse K,~'. Let z = K,~11. 
Since « = 17 and since K is symmetric, the results in the statement and 
proof of Corollary 9-92 simplify to 


k(E) = 1/(17z) 
AE = k(E)2T 
PE = I + K` — k(B)z2". 


Il 


Calculations for various sets Æ appear in Tables 9-2, 9-3, and 9-4. 


TABLE 9-2. k(E), A€, anD PE wHEN E consists oF THREE POINTS ON A 
Line; E = {(0, 0), (a, 0), (2a, 0)} 


a=1 a=2 
k(E) = 0.785 k(E) = 1.082 
AF 0.393 0.215 0.393 AE 0.372 0.256 0.372 
PE 0.460 0.393 0.148 PE 0.610 0.256 0.134 
0.393 0.215 0.393 0.256 0.488 0.256 
0.148 0.393 0.460 0.134 0.256 0.610 
a=3 a=4 
k(E) = 1.256 k(E) = 1.379 
AE 0.365 0.270 0.365 AE 0.361 0.277 0.361 
PE 0.663 0.212 0.125 PE 0.693 0.189 0.118 
0.212 0.576 0.212 0.189 0.621 0.189 
0.125 0.212 0.663 0.118 0.189 0.693 


TABLE 9-3. k(E), A€, anD PE wHenN E Consists oF THREE POINTS ON A 
DriaconalL; E = {(0, 0), (a, a), (2a, 2a)} 


a=3 


k(E) = 0.955 k(E) = 1.407 

AE 0.875 0.250 0.375 AE 0.360 0.279 0.360 

PE 0.558 0.295 0.147 PE 0.699 0.185 0.117 
0.295 0.411 0.295 0.185 0.631 0.185 
0.147 0.117 0.185 


a=5 
k(E) = 1.622 
AE 0.356 0.287 0.356 
PE 0.738 0.157 0.106 
0.157 0.687 0.157 


0.106 0.157 0.738 
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TABLE 9-4. k(E) AND A¥ ror THREE Sets E 


21 points arranged in an isosceles right triangle of base 6 
k(E) = 1.611 
rE 


0.162 

0.046 0.059 

0.041 0 0.051 

0.041 0 0 0.051 

0.044 0 0 0 0.059 

0.109 0.044 0.041 0.041 0.046 0.162 


13 points arranged in a figure x 


k(E) = 1.778 
rE 
0.153 0.153 
0.064 0.064 
0.030 0.030 
O15 
0.030 0.030 
0.064 0.064 
0.153 0.153 


30 points in a 5-by-6 rectangle 


k(E) = 1.670 

AE 

0.105 0.042 0.038 0.038 0.042 0.105 
0.044 0 0 0 0 0.044 
0.041 0 0 0 0 0.041 
0.044 0 0 0 0 0.044 
0.105 0.042 0.038 0.038 0.042 0.105 


It is hard to acquire an intuition for the capacities of sets aside from 
their monotonicity. However, the values of PE and ÀF are quite 
intuitive. The latter, in this random walk, may be thought of as the 
entrance probabilities to Æ if the chain is started near oo. For example, 
in the case of the 5-by-6 rectangle in Table 9-4, it is clear that the 
corner positions should be considerably more probable than the points 
on the side. Points on the short sides are more probable than points 
on the long ones, and the rectangle cannot be entered at an interior 
point. Equally instructive are the values of A? for three-point sets in 
Tables 9-2 and 9-3. The middle point is least likely to be hit first, 
but the difference decreases as the points are spread farther apart. 


13. 


. Prove that ‘N,, = N, for recurrent sums of independent random 
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Problems 


variables. 


. Prove that for any null chain and for any states i and j there is a finite 


set E such that 
Mp 


(n) 
Nig 


lim sup <e. 
n 


. Let a recurrent chain P be started in state0. Let «f be the mean number 


of times in state t in the time required to reach the set # and then return 
to 0. Thus, for example, if E is the set of all states, then a = (1/ao)«. 
(a) Prove that for any set E there is a constant k; such that «ë = kga. 
(b) Let Q be the transition matrix for the transient states when 0 is 
made absorbing and let «& be the restriction of « to the transient 
states. Show that if E is a set which does not contain 0, then 


1 — £H = + afl — Q)B*1], 
ao 


where B® is restricted to the transient states. 

(c) Conclude that if E is a set which does not contain 0, then 1/k, is the 
transient capacity of the set Æ in the chain Q, provided the dis- 
tinguished superregular measure is taken as a. 


. Prove that for a recurrent chain 


f: a 
i — k 
Nuc + Ny = Na 


i 


. For the symmetric random walk in two dimensions, verify that the 


function whose (x, y)th entry is (|| + |y| + 1)~1 is a potential, using 
only Corollary 9-16. Show that its charge f satisfies af = 0. 


. Let P be the one-dimensional symmetric random walk, and let Ẹ = 


{0, 1, 2, 3}. 

(a) Find B®, AF, PE. 

(b) Find all potential functions with support in Æ, and find their charges. 
(c) Compute «f and ag for each. 


. Let P be the symmetric random walk in two dimensions, and let a, b, 


and c be three distinguished states. We play a game as follows: The 

process is started in 0. Each time it is in a or b we win a dollar, and each 

time it is in c we lose two dollars. 

(a) Let gf? = Mo [expected gain to time n]. Prove that lim g¥ exists, 
and find a computable expression for it. 

(b) What happens if the game is changed so that we lose only one dollar 
when the process is in state c? 


Problems 8 to 10 lead to the fact that in a normal chain the union of a small 
set and a finite set is small. 


8. 


Let E be a small set, and let F = E U {k} have one more point. Show 
that 

BE = BE + BEBE, for allje E. 
From this equation show that if Af exists, then Af exists and F is a small 
set. 
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9. In Problem 8 show that 
"Hy = > Bh, "Hys for all j eB. 


meE 
Use the identity of Problem 8 to eliminate the factors BF,, and solve 
the resulting identity for BẸ. Prove from this result that Af exists, 
provided P is normal. 


10. Use the results of the two previous problems to prove that the union of 
a small set and a finite set is small in a normal chain. 


Problems 11 to 22 develop a new null example and use it to illustrate results 
in the chapter. The state space consists of all points z = (x, y) in the plane 
with integer coordinates >1. It will be convenient to let n = x + y. We 
let (1, 1) be our state 0. Define 


Pre +19) = ee re Puavo — E 
and 
Pev, = = 
11. Verify that P is null recurrent and that a, = 2/[(n — 1)n]. 
12. Compute Ê. 
13. Prove that PYY is the same for all z with n = x + y fixed. Do the same 
for 


14. Let E = {z |x + y < no}. Find AF and A‘. 
15. Show that 


Pe ar 
Nax = a a n) ; ae es 


0 otherwise. 


16. Let f be defined by fo = —1, fe,» = 10, and f, = 0 otherwise. Show 
that f is a charge, and use parts (1) and (3) of Theorem 9-15 to find the 
potential g. 


17. Check that f = (I — P)g for the functions of Problem 16. Does 
ag = 0? Verify that \¥g = 0 for all finite sets containing the support. 


18. Show that Gp, = 0 and Gao = (n — 1)n/2. 

19. Use Problem 18 and Proposition 9-45 to show that 
(n — l)n ô 

(n’ — 1)n’ New 
20. Verify that the potential g found in Problem 16 is — Gf. 


21. Find C, and compute a potential measure of finite support in two different 
ways (in analogy with Problems 16 and 20). 


22. Let E be the triangular set of Problem 14, and let x be its characteristic 
function. Verify Proposition 9-43. 


Gy = 
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Problems 23 to 26 develop some theoretical results for an ergodic chain in 

terms of the operator K. 

23. Express K in terms of M and M. 

24. Prove that M, = (Ki; = K). 

25. Show that M,, — My = k({i}) — k({j}). 

26. What happens to the formulas in Problems 24 and 25 if the reference 
point 0 is changed ? 

Problems 27 to 32 carry this development further for finite recurrent chains. 


27. Show that k(S) = Myo = Myo. 

28. Prove that K1 = k(S)1 and aK = k(S)a. 

29. Prove that Ma? = c1, where c = k(S) — >, kija. 
30. Prove that 


K = (gge + P-1) . 


31. Prove that the set of charges is the same as the set of potentials. 


32. To what extent do the results of Problems 27 to 31 generalize to strong 
ergodic chains ? 


Problems 33 to 39 are intended to illustrate Problems 23 to 32 for the Land of 
Oz example, which was defined in Chapter 4. [See also Chapter 6, Problem 
1.] We choose the middle state (nice weather) to be the distinguished state 
0. 


33. Show that P = P (the chain is reversible). 

34. Find M. 

35. Find K, using the result of Problem 23. 

36. Find k(S), using the result of Problem 27. 

37. Find K, using the result of Problem 30, and compare with the value of K 
found in Problem 35. 

38. Check the results given in Problems 24, 25, 28, and 29. 


39. Find the most general charge and the most general potential function. 
Verify that the set of charges is the same as the set of potentials. 


Problems 40 to 48 work out the probabilistic solution of the so-called Second 
Boundary Value Problem in the sense that Theorem 8-41 presented the 
solution of the First Boundary Value Problem. Let P be an absorbing 
chain whose transient states communicate and whose absorbing states form a 
finite set B. B is thought of as our “boundary.” To each state k in B we 
associate a ‘‘neighboring”’ transient state k’. For a given function h, we 
define its normal derivative d, at k to be h, — hẹ. The problem is to find a 
function h which is regular on the transient states and has specified normal 
derivatives. 


40. Prove that hg = N Rh; for any solution. 


41. Modify the original chain so that instead of stopping at an absorbing 
state k, it moves to the neighboring k’ with probability 1. Show that 
this new chain is recurrent. 
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42. Let P* be the transition matrix of the modified chain watched in B. 
Prove that this is an ergodic chain. 

43. Show that the requirement that h have the specified normal derivatives 
can be written as (I — P*)h, = d, where d has the given values d, as 
components. 

44. Prove that a*d = 0 is a necessary condition for a solution to exist. 

45. Show that a*d = 0 is also sufficient by showing that d is a charge, that its 
potential will serve as hg, and that the A supplied by Problem 40 is a 
solution. 

46. Prove that the most general solution differs from the given one only by a 
constant. 

47. Prove that if the modified chain (indexed on all states) is a normal chain, 
then the most general solution is 


d 
h= -«(5) + c1. 


48. Show that if the transient states are the lattice points in a bounded 
convex set in n-dimensional Euclidean space and if the process moves as 
a symmetric random walk which is stopped when it moves out of the 
convex set, then we can apply the previous results. 


Problems 49 to 53 give a complete characterization of degenerate chains. 


49. Prove that if P is degenerate, so is Ê. 

50. Show that if P is degenerate and if we let i < j stand for /A, = 1, then 
< is a simple ordering. 

51. Prove that if k < i < j, then ’H,, = 1. Deduce from this fact that, in 
moving to the right, the process can move at most one step at a time. 
[Hint: Consider A? for E = {k, i, j}.] 

52. Prove that the ordering of states must be that of the integers, the positive 
integers, or the negative integers. 

53. Show that the basic example and its reverse illustrate two of the possible 
orderings, and construct an example of the third. 


CHAPTER 10 


TRANSIENT BOUNDARY THEORY 


1. Motivation for Martin boundary theory 


For purposes of motivation it is convenient to think of the state 
space of a Markov chain P with only transient states as being similar 
to the open unit disk of two-dimensional Euclidean space. In two- 
space the boundary of the disk—namely the circle S1—has the property 
that there is a one-one correspondence between the non-negative 
harmonic functions A(re) in the disk and the non-negative Borel 
measures u” on the circle. The correspondence is 


h(re®) = f P(re'®, t)du"(t), (*) 
si 
where P(re'®, t) is the Poisson kernel 


1 — r? 
1 — 2r cos (6 — t) + r? 


Transient boundary theory seeks an analogous representation theorem 
for all non-negative P-regular functions defined on the state space. 

The first problem that arises is to find what the analogs of the 
circle (the boundary) and the kernel should be. We would like a 
representation 


h(i) = | K(i, x)du"(z). 


In the case of the disk, a calculation with Green’s identities shows that 
any kernel P(re'®, t) giving rise to the correspondence (*) and satisfying 


1 Qn 


2r Jo 


Pi(re®, t)dt = 1 
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must be the normal derivative at ¢ of the Green’s function for the disk 
relative to the point re’. That is, 


P(r, t) = É G(-, re) ; 


t 


Application of l’Hospital’s rule shows that 


Gz, re?) ð 0 
te ate) ma] 


radially 
P(re", t)/P(p, t), 


where p is any fixed reference point in the disk. Hence, except for 
the positive factor P(p, t) which depends on t but not on re”, the Poisson 
kernel is equal to 

G(z, rei?) (**) 

radially Gein) 

Therefore this last function may be used in the representation (*) in 
place of the Poisson kernel; the distinction between the kernels is just a 
normalizing factor (depending on t) which can be absorbed by changing 
the measures. 

Two comments are in order. First, the limit in (**) need not be 
taken radially. Any method of approach of z to t, as long as z stays in 
the interior of the disk, will give the same value. Second, the con- 
siderations above apply equally well to any domain in n-dimensional 
space with a sufficiently smooth boundary. Although the explicit 
form of the kernel will vary from region to region, it will always be 
connected to the Green’s function in the way we have just described. 

R. S. Martin [1941] made use of these observations to define an ideal 
boundary for an arbitrary domain in Euclidean space. If the Green’s 
function for the region is denoted G(z, y), he noted that points t on the 
ordinary topological boundary of the region did not necessarily have 
the property that 
Gz, y) 

G(z, p) 


exists. He suggested that distinct ideal boundary points u should be 
associated to subsequences {z,} which yield distinct values for the limits 


Gen, Y) 
pace G (Zn, p) 


lim 
zat 


= Ky, u). 


He went on to show that the desired representation theorem is indeed 
obtained in terms of this boundary and the kernel K(y, u). 
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Doob [1959], taking advantage of the fact that the N-matrix for a 
transient Markov chain is the analog of the Green’s function (see 
Proposition 7-4), showed that Martin’s approach could be used to 
obtain a boundary for Markov chains. (We remark that N,, corre- 
sponds to G(j, i) with the indices in the reversed order.) As the analog 
of Martin’s kernel he used limits on j of expressions of the form 
NylN oj: 

There is a minor restriction imposed by the Doob approach, namely 
that No; is assumed non-zero for all j. For a more general chain in 
which it is not possible to get from state 0 to every other state, what 
Doob did was to consider only those states that could be reached from 
0. We shall not follow him in this respect. We simply use limits of 
Ni;/(7N), instead, where m is a probability vector such that mN is 
strictly positive. In terms of this kernel there is a natural space to 
try as the one corresponding to the closed unit disk. The space should 
consist of one point for each possible limit of N,,/N,,;. Actually we 
shall find that this space is too large—that the space has to be cut down 
a bit for the representation to be unique. The price of uniqueness is 
that the cut-down space is not compact. 

The introduction of m in place of 0 itself leads to a problem. The 
representation will have to be restricted to 7-integrable regular func- 
tions h, those for which zh is finite. This requirement evaporated in 
Martin’s or Doob’s treatment because for them v assigned unit mass to 
a point 0 and A(0) was automatically finite. 

Hunt [1960] gave a new approach to Martin boundary theory for 
Markov chains which was more probabilistic in nature than Doob’s. 
We follow Hunt’s probabilistic approach, except that we use a different 
metric and get a boundary which is more like Doob’s. 


2. Extended chains 


We begin by introducing the machinery which we shall use in later 
sections to develop Martin boundary theory for Markov chains. Weare 
going to use a broader notion of Markov chain than we have been 
considering so far—namely, a process whose time index starts at — o0 
and whose behavior is Markovian only after it has entered certain sets. 

That is, we extend the concept of Markov chain in two ways. First, 
we shall allow any finite measure 7 as a starting distribution. This is 
only a minor modification in the theory and is a convenience in that it 
removes the necessity of normalization in certain constructions. 
Second, we shall extend the Markov chain to the past, that is, to a 
stochastic process {x„} where n runs through all the integers (including 
negative integers). This second extension is an essential one and 
will be the main topic of this section. 
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First we must extend the concept of a stochastic process. The state 
space will be a denumerable set S (with at least two elements) and two 
distinguished other states a and b. The underlying set 2 of the 
measure space will consist of all doubly infinite sequences 


w = (..., C 2 C_45 Cg, Cy, Cg...) 
such that 


(1) C ES orc, = a Or C, = D. 

(2) Ifc, = a, then cm = a for all m < n. 
(3) Ifc, = b, then cp = b for all m > n. 
(4) c, E S for at least one n. 


The interpretation of (2) and (3) is that state a stands for “not yet 
started ” and state b stands for “stopped.” Thus (4) has the meaning 
that each path in 2 represents some nontrivial possibility for the 
process. We shall refer to 2 as a double sequence space. 

We define the outcome functions x, as usual except that n may be 
any integer. A basic cylinder set is any truth set in Q of a statement of 
the form 


Lm = Cm A Em1 = Cmti AOA En = Cy, 


where at least one c; is an element of S. The field generated by the 
basic cylinder sets is denoted F , and the smallest Borel field containing 
F is F. 


Definition 10-1: An extended stochastic process {x„} is the set of 
outcome functions for a measure space (2, Y, Pr) such that 


(1) 2 is a double sequence space with state space S U {a, b}. 
(2) Y is the smallest Borel field containing the field of vylinder sets 
of 22. 


(3) Pr[{w | x, = i}] < œ for every integer n and every i in S. 


Note that we do not augment the measure space (Q, 4, Pr) by 
allowing all subsets of sets of measure zero to be measurable. 

We shall use interchangeably the notations Pr[P] and Pr[p], where P 
is the truth set of the statement p. Thus the third condition may be 
replaced by the condition Pr[z, = i] < œ for all n and for all 7 in S. 
From it and from condition (4) in the definition of 2, we find that Pr 
must be sigma-finite. However, the measure Pr need not be finite. 

As promised, Definition 10-1 extends the definition of stochastic 
process in two ways: The time index n runs through all the integers, 
and the measure Pr need not be a probability measure or even a finite 
measure. 
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In examples it is necessary to have a method of constructing the 
measures for extended stochastic processes. If the measure Pr has 
already been defined consistently on basic cylinder sets, we need a 
version of the Kolmogorov Theorem to prove that Pr is completely 
additive on F. Theorem 1-19 then will give the extension of Pr to all 
of Y. At this point, therefore, we stop to outline a proof that Pr is 
completely additive on F. 

Now the argument of Lemma 2-1 easily shows that Pr is non- 
negative and additive. Extend Q to include the set of all doubly 
infinite sequences of a’s, b’s, and elements of S, and define Pr to be zero 
on all cylinder subsets of the set added. Then if Pr were a finite 
measure, we could temporarily rearrange the time scale and then con- 
clude complete additivity by Theorem 2-4. But, in general, Pr is 
merely sigma-finite and therefore we shall write it as the countable sum 
of totally finite non-negative additive set functions, each of which is a 
measure on cylinder sets depending on a bounded time interval. 
Each of the summands is completely additive by the above argument, 
and therefore Pr is completely additive by Lemma 1-3. Thus all we 
need to do is decompose Pr as such a sum. The countable family of 
statements indexed by i € S and by n > 0, consisting of 


Ln = AN Mp1 = 1, 


is a disjoint exhaustive family in the original double sequence space, 
and each statement is assigned finite Pr-measure. For each of these 
statements q, define 


Pr®(H) = Pr[E A {w | qh] 


for E in the field of cylinder sets. Then the family {Pr} is the 
required family of set functions. 
We now fix our attention on a single extended stochastic process 


{2n} 
Although we may be dealing with an infinite measure space, the 
conditional probability 


Pr[p | q] = Prip ^ q]/Priq] 
is still well defined as long as 0 < Pr[g] < œ. We define Pr[p |q] to 
be zero if Pr[g] = 0. 


Definition 10-2: For Æ C S and for any we Q such that z,(w) € E 
for some n, let u,(w) be the infimum of all n such that x,(w) € E and 
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let v;(w) be the supremum of all such n. Define u = uy and v = vz; 
u and v are called the initial time and the final time, respectively. 


By condition (4) of the definition of 2, we see that u(w) and v(w) are 
defined for all w. Moreover, u,;(w) < v;(w) whenever u,;(w) and 
Vlw) are defined. The values u,(w) = —oo and v,(w) = +00 are 
possible. If x,(w) E€ E for some n, we have 

a if n < u(w) 
element of S— E if u(w) < n < u,;(w) 
Gai element of S — E if vz(w) < n < v(w) 


b if v(w) < n. 


Proposition 10-3: The functions u, and vs have a -measurable 
subset of Q as domain and are each Y-measurable. 


Proor: We prove the result for up. We have 
k 
{w | uz(w) < k} = Y MORG | z,(w) = i}, 


and the union of these sets on k is the domain of ug. 


Definition 10-4: Let E be a subset of S and define 


Yn(w) = Xn +ug(ay(@) 


for all n > 0 and for all w such that u,;(w) > ~œ. Let Q be the 
ordinary sequence space with state space S U {b} with the measure of 
each measurable set A C 2 defined to be 


Pri{w | (Yow); y1(w), - - -) € AJ]. 


The measure space 2 and its outcome functions together are called the 
process watched starting in Æ. 


The process watched starting in a set E is an ordinary stochastic 
process, except that the starting distribution need not be a probability 
measure and can possibly be infinite. 

Let je S. The mean number v, of times that the process {x,} is inj 
can, as usual, be computed by 


Vp > Pr[z, = j], 


except the sum is over all integers n. By Definition 10-1, each 
summand is finite, but the sum may be infinite. 
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Definition 10-5: An extended stochastic process {x,} is an extended 
chain with transition matrix P if the following conditions hold for each 
finite subset E of S: 


(1) The domain of u, has positive measure. 

(2) Pr[us = —oo] = 0. 

(3) The process watched starting in Æ is a Markov chain with 
transition matrix P and finite starting measure (but not neces- 
sarily a probability measure). 

(4) For all j ES, v; < œ. 


Note that the transition matrix P of an extended chain necessarily 
satisfies P1 = 1. The state space of P never needs to be any bigger 
than S U {b}, but as we shall see shortly it must contain S. If b is in 
the state space, it clearly must be an absorbing state. 

From the definition of the process watched starting in Æ, we see that 
the total starting measure for the process is equal to the measure of the 
set of paths on which there is a first entry to Æ. That is, it is the 
measure of the set where up > —00. By conditions (1) and (2), this 
measure is positive. Hence the process watched starting in Æ has a 
starting measure which is not identically zero. 

Applying this observation to the one-point set {j}, we see that j 
must be included in the state space of P. 

Let {z,} be an extended stochastic process satisfying (1), (2), and 
(3). We shall derive as Proposition 10-6 a necessary and sufficient 
condition for (4) to hold. Let E be a finite set of S. We introduce the 
abbreviations 

uE" = Pr[us = ™ A £m = 1] 
and 


E Em, 
p 2H 


Then „F is the starting measure of the process watched starting in Æ 
and is a finite measure with support in Æ by (3). Our remarks above 
showed that uF is not identically zero. 

Since the process watched starting in Æ is a Markov chain and since 
(2) holds, the following computation is justified: For j € Æ, 


Priz, =J A Engi = k A Wyo = 8] = >. BE PEP Piss 
m,i 


where P™-™ = 0 if n<m. A similar computation of Pr[p] is 
possible for any p such that p is false if Æ is never entered and p 
depends only on outcomes after Æ is entered. 

As an application of this calculation, we can relate condition (4) to 
properties of P. 
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Proposition 10-6: Let {x,} be an extended stochastic process satisfying 
conditions (1), (2), and (3) for an extended chain. Then for j e E, 


v; = (W?N); 


Moreover, the process is an extended chain if and only if P is the 
transition matrix of a transient chain whose transient states are S 
and which may have b as an absorbing state. 


Proor: We have 
vy = > Priz, =j] = > Bee™ POM = > Bem Ny; = > ENa 
n i 


n,m,i m,i 

or 

v; = (KFN); 
Taking E = {j}, we find 

vy = pp Ny 
Now u? is assumed finite by (3) and it is strictly positive by (1). Hence 
v; is finite for all j € S if and only if all elements of S are transient for a 
Markov chain with transition matrix P. Furthermore, a cannot occur 
as a state, and if b occurs, it must be an absorbing state. 


Corollary 10-7: Let {x,} be an extended stochastic process satisfying 
conditions (1), (2), and (3) for an extended chain. Then v, > 0 for 
all j. 


Proor: We have v; = pN, and each factor on the right side is 
positive. 


An important example, but by no means the most general example, 
of an extended chain is obtained as follows. Let P be a Markov chain 
with all states transient and let 7 be a starting distribution such that 
aN is strictly positive. (For instance, let 7 assign weight 2-7” to the 
nth state.) Let P be the enlarged chain obtained by adding the 
absorbing state b. Form an extended stochastic process by defining a 
measure on cylinder sets as follows: Every basic statement containing 
the assertion x, = i for n < 0 andi # a or the assertion z,, = a for 
m > 0 gets probability zero. The statement 


Lim =AN AL y~H=ANMATANALM=JAXM=L 
A+++ A tay = 1 A Ly = 8 
gets probability 7,P,,P;,,...P,,. The probabilities of all other basic 
cylinder statements can be obtained from these by adding the prob- 


abilities of a suitable number of statements of the form just described. 
The claim is that this extended stochastic process {x,} is an extended 
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chain with transition matrix P. Property (1) follows from the fact 
that wN > 0. Property (2) is an immediate consequence of the 
definition of the process. Property (3) follows from Theorem 4-9; 
the starting distribution for the process watched starting in Æ is mB”. 
Finally Property (4) comes from Proposition 10-6; alternatively we 
could compute directly that the mean number of times in state j is 
(mN); We shall call this process the extended chain associated with 7 
and P. 

If {x,} is an extended chain and Æ is a finite set in S, we define 
vë = WEN. We know that the process watched starting in Æ is a 
Markov chain with starting distribution u”. Hence vf is the mean 
number of times in j in this Markov chain. That is, it is the mean 
number of times in j after entering E in the extended chain. From this 
interpretation we see that v¥ is monotone increasing in # and that 
vř = v for je E. In order to get an interpretation of v* and p? in 
terms of v, we shall generalize the notion of balayage potential as 
defined in Chapter 8. 

If P is a Markov chain with all states transient, if h is a non-negative 
finite-valued superregular function, and if Æ is a finite set of states, we 
define the balayage potential of h on Æ to be the function B¥h. By 
Lemma 8-22, B®h is a pure potential with support in E. Now Bh is 
the unique pure potential with support in Æ which agrees with h on F, 
since if h is another such potential, h must be the unique balayage 
potential of Bh on E (Theorem 8-46) and hence must equal B*h. 
Moreover, if E,„ is an increasing sequence of finite sets with union the 
set of all states, then the charges of B¥.h, namely (I — P)(B*¥=h), 
converge to (I — P)h, since 


lim PB¥xh = Ph 


by part (4) of Proposition 8-16 and by monotone convergence. 

Let us dualize these results. Let y be a non-negative finite-valued 
superregular measure, and let Æ be a finite set. Then there is a unique 
pure potential measure with support in Æ which agrees with y on Æ. 
This potential is defined to be the balayage potential of y on E. It 
has the property that if Æ, is an increasing sequence of sets with union 
the set of all states, then the balayage charges of y on H, converge to 
y(I — P). By Proposition 6-16, the balayage potential of y on Æ is 


(ve yeU(FN)s), 


P=ln 


where 


and (FN); = > Q". 
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If we let y? be the balayage potential of y on E and if pF is the charge 
of yF, then E C F implies 8E = 8f B®. To see this equality, we note 
that B* and $F B® are both pure charges with support in Æ and that the 
potential of 8’ BF, when restricted to £, is 


(BY BEN), = (BPN — BF EN), = (BFN)g = yg = (PPN) 
by Lemma 8-17. Hence the potentials of 8E and B*B* agree on E, 
and they must therefore agree everywhere. Thus 
BE = BF BE 
by Theorem 8-4. 


Our characterization of v? and uF in terms of balayage potentials is 
the content of the next proposition. 


Proposition 10-8: For every extended chain with transition matrix P, 


(1) vis a superregular measure for Ps. 

(2) (I — Ps) = p, where p = lim, pf as E increases to S. 

(3) v? and u” are the balayage potential and charge, respectively, for 
von E. 


Proor: We have 
vē Ps = ENP, = E(N = I) = vE = pë. 


Along any increasing sequence of sets E„ with union S, v increases to 
v. Hence by monotone convergence 


vPs = v — lim př». 
n 


This equality implies that 


vPs = v — lim pF 
Ets 
and proves (1) and (2). To prove (3) we need only remark that v? is a 
pure potential with support in E which agrees with v on E. Hence it 
must be the balayage potential. 


Proposition 10-8 has as a converse the following theorem, which 
asserts roughly that any superregular measure can be represented as 
the vector of mean times in the states of some extended chain. This 
result will not be used until Section 11, and its proof, which is quite 
long, will not be given until after that section. 


Theorem 10-9: Let P* be the transition matrix of a transient chain 
with P*1 = 1 and with at most one absorbing state b, and let v be a 
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non-negative finite-valued measure defined on the transient states and 
superregular for the restriction of P* to the transient states. Let S 
be the support of v, and suppose S has at least two elements. Then 
there is an extended chain {z,} having Pž „m as transition matrix and 
having v as its vector of mean times in the states of S. Furthermore, if 


vy = WN +p 


is the unique decomposition of vs with p regular for P¥, then uN is 
contributed by the paths w with u(w) > —oo and p is contributed by 
the paths with u(w) = —o. 


To conclude this section we shall define what we mean by the reverse 
of an extended stochastic process, and we shall prove that the reverse of 
an extended chain is an extended chain. The transition matrix of the 
reverse, at least when restricted to S, will turn out to be the v-dual of 
the transition matrix, restricted to S, of the original process. We 
need the following lemma, whose proof uses the calculation preceding 
Proposition 10-6. 


Lemma 10-10: If {z,} is an extended chain with transition matrix P 
and if k is in a finite set E in S, then 


Pr[ty,-2 =i A ®,-1 =J At, = k] = PiP ikek 
where e” is the escape vector for P. 


PROOF: 


Pr[zy, -2 =iA Tog -1 =JjA ty, = k] 


>, Prit, =i A tnp =J A Erp =k 
n 

A {x,} not in Æ after time (n + 2)] 
= > pEm™PS-™PP,,eF by the calculation 


m,s.n 


> Prix, = i]P,,P;,e8 by the calculation again 
n 
= nP uP irek. 
For any point w in Q we define w’ to be the point in Q with 
znw) if x_,(w) ES 
if z lw) = b 


X,(w") = 4a 
b EE T E 
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Definition 10-11: Let {x,} be an extended chain defined on Q with 
measure p. Set 
Pr’[w € A] = Pr[w’ € A] 


for all sets A for which the right side is defined. The extended sto- 
chastic process on 92 defined by the measure Pr’ is called the reverse of 
the extended chain. 


Proposition 10-12: If {x,} is an extended chain with transition matrix 
P, then its reverse is also an extended chain and its transition matrix P 
satisfies 

Py = v,Pylv; 


for all states ¢ and j in S. 


Proor: From the definition of w’, we see that 
ûs(w) = —Vg(w’). 
Hence 
Pr'[w € domain ûz] = Pr[w e domain v] = Pr[w e domain uz] > 0 


and (1) holds. Since the chain watched starting in Æ is in the finite set 
E infinitely often with probability 0 (second half of Proposition 10-6), 
0 = Priv; = +œ] = Pr [fi, = —oo]. 


Thus (2) holds. 

Next, we show that the reverse process watched starting in Æ is a 
Markov chain with transition matrix P. We shall compute only a 
typical conditional probability: Let ke H and first suppose i # b. 
Since 


Prt = k A ®a41 SJA ®agse = 1] 
Pr[w,, -2 =A Top-1 =j A %y, = k] 


= E 
= 4 PP ie 


Il 


by Lemma 10-10, we have, provided the condition has positive Pr’- 
measure, 


Pr'[zi +2 =í | Tis = k A Cip+1 = jl= (vi, Pi P irek) (vP irek) 
v;Pilv; 


P 


it 
Next we compute the typical probability 


Pr'[ta = k A ®agt1 = J A tipt = D]. 
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Since we know b is absorbing, we may assume j # b. Then this 
probability is 


Pr'[£a = k A Lag41 = j] — Pr'[2,, k A Zapi =J N Xap+2 EN] 
vP irek — > vP P ikek: 


ies 


Hence if Pr’[z,, = k A £ıp+1 =J] > 0, then 


Pr'[£a +2 = 5|%, = k A Zappi = J) = (yy - > v,P i) v;- 
ieS 
(Notice this probability is non-negative because v is Ps-superregular.) 
Therefore the reverse process watched starting in Æ is a Markov chain. 
The total starting measure is finite for the reverse process watched 
starting in E because 


> Pr[a,, = i] = > Prix, = ĉ] 
i i 
= > vef by Lemma 10-10 
i 


< © 


by (4) for the given process and by the finiteness of the set E. Hence 
(3) holds. 

Finally we prove (4) for the reverse process. The same argument 
as in Proposition 6-12 shows that 


Ni, = yNij/v;. 
Hence P is transient and (4) holds by Proposition 10-6. 


3. Martin exit boundary 


We now define the Martin exit boundary of a transient chain with 
respect to a given starting distribution. With this boundary we shall 
be able to describe the long-range behavior of the process and we shall 
obtain a Poisson integral-type representation for all finite-valued non- 
negative superregular functions which are integrable with respect to 
the starting distribution. 

Let P be a Markov chain with all states transient and let 7 be a 
starting vector (m = 0 and 71 = 1). Throughout our discussion P 
and v will be fixed. The vector nN is non-negative, finite-valued, and 
superregular. Ordinarily we shall assume that 7 has been chosen so 
that mN is strictly positive; that is, so that there is positive probability 
of reaching any state eventually. But for technical reasons which will 
arise when we consider h-processes, it will be convenient to adopt 
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conventions about what to do when vN has some zero entries. These 
conventions we shall discuss at the end of this section, and except when 
stated otherwise we shall assume that 7N is everywhere positive. Let 
S be the state space of P. 

Since N,; = (mN); has been assumed positive, we may define 


K(i, j) = Nij|Nx;- 


The notation K(-,7) will mean K(i, j) considered as a function of the 
first variable with j fixed. Then for each j, K(-,7) is a non-negative 
finite-valued superregular function with 7K(-,j) = 1. It is regular 
everywhere except at j, where it is strictly superregular. 

For fixed i the function K(i, -) is bounded, since 


Ni 1 [NaN 1 1 N 
Kin =F! = 3 (GF a) = go Da < yo Ma = FS 
( i) Nz; Nu Ny; Nui a i : i 
where carets denote duality with respect to mN. 
The real-valued functions d,(j, j’) defined on S x S by 


d,(j,j’) = |K, j) — K(i, j’) 


have the property that {K (i, ja)} is a Cauchy sequence for all i in S if 
and only if limp, noo di(jm Jn) = O for all i. According to the bound we 
just computed for K(i, j), we may lump the functions d; into the single 
finite-valued function d defined by 


dj j) = > wNal KG j) - KOI, 
ies 
where the w; are positive weights such that > w,N,, is finite. 
We shall show that d is a metric for S. Clearly d satisfies all the 
conditions of a metric except possibly that d(j,j’) = 0 implies j = 7’. 
But if d(j,j’) = 0, then, since w,N,, > 0 for all 2, we must have 


Multiplying through by P and supposing that j # 7’, we obtain 
Kj’, j) = X, PaK (ij) = > PaKa j’) < KGS’), 
i i 


the strict inequality holding, since K(-, j’) is not regular at 7’. We 
conclude that j = j’ and that d is a metric. 


Proposition 10-13: A sequence {j,} in the metric space (S,d) is 
Cauchy if and only if the sequence of real numbers {K(i, j,)} is Cauchy 
for every 1. 
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Proor: If {j,} is Cauchy, then certainly {w,N,,K(i, j,)} is Cauchy. 
Since w,N,, > 0, {K(i, j,)} is Cauchy. 
Conversely, let {K (7, j,)} be Cauchy for all 7, and let € > 0 be given. 
Choose a finite set of states E such that 
> wWNin < €/2, 
keS-E 
and choose M sufficiently large that 


K(i, jn) — K(i,j,.)| < ~ 
K(i, jn) — K jn) < 5 Sao. 
for i c E and for all n,m > M. Then d(j,,j,) < €. 


We define S* to be the Cauchy completion of the metric space (S, d), 
and we let B = S* — S. The set B is the Martin exit boundary for the 
chain P started with distribution 7. 

The set B is not necessarily a boundary in the topological sense, 
since there are examples in which it is not a closed set in S*, but the 
abuse of notation will not disturb us. 

From Proposition 10-13 we see that K(i, -) is a uniformly continuous 
function on S. Hence it extends uniquely to a continuous function 
on S*, We shall use the same notation K(i, -) for the function on S8*, 
but will normally denote points of B = S* — S by x or y. 

The characterization of Cauchy sequences given in Proposition 10-13 
shows that the nature of the space S* does not depend upon the choice 
of the weights w, That is, the Cauchy completions of S corresponding 
to two different choices of weights are homeomorphic. 

Since K(i, -) is continuous on S*, it follows that the extension of d 
to S* is simply 

d(x, y) S 2 wN n| K(i, x) z K(i, y)|. 
E 
A repetition of the argument in Proposition 10-13 then shows that 
{x,} is Cauchy in S* if and only if {K(i, x„)} is Cauchy for each i. 
Applying this result to the sequence whose terms are alternately x and 
then y, we find that x = y if and only if K(i, x) = K(i, y) for all i. 
We state this conclusion as a proposition. 


Proposition 10-14: K(i, x) = K(i, y) for all i if and only if x = y. 
Proposition 10-15: The space S* is compact. 


Proor: Since S* is a metric space, it is enough to prove that any 
sequence {x,} has a convergent subsequence. Now 


K(i, Ln) < sup K(i, j) < Nil Nu < ©. 
jes 
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Thus by a diagonal process we may choose a subsequence {z,, } such that 
{K(t, x,,)} is Cauchy for alli. Then {z,,} is Cauchy in S*. Since S* 
is complete, {x,,} is convergent. 


The sets in the smallest Borel field containing the open sets of S* are 
called Borel sets. The boundary B is a Borel set, since it is the comple- 
ment of a countable set. Finite measures defined on the Borel sets are 
called Borel measures. 

We conclude this section by agreeing on what conventions we shall 
adopt in case 7N has some zero entries. If is the set of all states, let 


W = fics | (nN), > 0}. 


The special nature of W implies that Py = PY, and by Lemma 8-18 
we see that the fundamental matrix for Py is Ny. Thus if we agree 
that boundary theory for P and ~ is to be interpreted as boundary 
theory for Py and wy, we find that for i and j in W 


K(i, j) = Nij| Nay; 
and K(i, j) is not defined otherwise. Hence the metric is 


dj j) = > wNy|K(i,j) — Ki, j’) 


ieW 


for j andj’ in W. Boundary theory is then done relative to the Cauchy 
completion of (W, d). 


4. Convergence to the boundary 


We continue to assume that P is a Markov chain with all states 
transient and that ~r is a starting distribution with mN > 0. Let P 
denote the enlarged chain obtained from P by adding the absorbing 
state b. 

The main theorem of this section will be that with probability one 
every path w has the property that either x,(w) converges in S* or the 
process along w disappears in finite time. From the results of the next 
sections we shall be able to sharpen the theorem by concluding that, 
when convergence takes place, it a.e. is to a nice subset of the boundary. 


Lemma 10-16: Let g be a pure potential for P with charge f. If 
{x,} is an extended chain of total measure one with transition matrix P 
for which vf is finite, then the limit of g(x„) as n decreases to u exists a.e. 
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Proor: Form the process watched starting in the finite set Æ. The 
claim is that for n > 0, {9(£u, +n)} is a non-negative supermartingale. 
It satisfies the supermartingale inequality because g is non-negative 
superregular. Thus, to show that the means are finite, it is sufficient 
to consider M[g(x,,)]. We have 


M[g(z.,)] = "g = wEN = f < of < œ, 


since f is non-negative and v® < v. Hence {g(x,, +n)} is a non-negative 
supermartingale. 

If 0 < r < s, then Proposition 3-11 applied to —g (or Proposition 
8-79) shows that the mean number of downcrossings of [r,s] by 
{9(,, +;)} up to time n is bounded by s/(s — r) independently of n and 
of E. Let n— œ and then let E increase so that u; approaches u. 
The mean number of downcrossings of [r, s] remains bounded, and by 
monotone convergence the mean number of downcrossings after time 
u is finite. By the argument in the proof of Theorem 3-12, g(z,) 
converges a.e. as n decreases to u. It can be shown that the limit is 
finite a.e., but this fact will not be needed. 


Lemma 10-17: Let Y be a Borel field of subsets of a set Q, let S* be 
a compact metric space, and for each n let f,: Q — S* be a function 
with the property that f,~1(Z) is in Y for all Borel sets E. If f,(w) > 
J() for all w, then f~+(£) is in Y for all Borel sets Æ. 


Proor: First consider the case of a compact set C. Let N.(C) be the 
open set of all points at a distance less than e from C. Then 


O= A OA fa Wand). 


Let @ be the class of all Borel sets E for which f-1(#) e 4. Then @ is 
clearly closed under countable unions and complements. Since @ 
contains all compact sets, it contains all Borel sets. 


Theorem 10-18: Let the chain P with all states transient be started 
with a distribution m such that rN > 0. For each path w let v(w) be 
the supremum of the n such that z,(w) isin S. Then a.e. either v(w) < 
+00 and 2% )(w) ES or v(w) = +œ and x,(w) converges to a point 
Lyq(w) ES* as n tends to infinity. Furthermore, if Y is the least 
Borel field containing the cylinder sets for P, then the set of w where 
x, is defined is in Y, and the inverse image under x, of any Borel set in 
S* is a set of F. 
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Proor: First we prove the measurability. The set where z, is defined 
is the countable union of the sets {v = 1}, {v = 2},... and the set 
{v = © A 2,(w) converges}. All of these except possibly the last are 
certainly in Y. The last set is 


A = {w|v(w) = œ A lim sup K(i, x,(w)) 
a = lim inf K(i, x,(w)) for all i} 


and is therefore in Y. Now the intersection of {v = n} with the inverse 
image under x, of a Borel set Æ is certainly in Y. Therefore to com- 
plete the proof of the measurability part of the theorem it is sufficient 
to prove that the intersection of the set A defined above with x,~1(H) 
is in for every Borel set E. In Lemma 10-17 let Q be the set A 
and let the field be the class of sets AMG, where GeGY. Since 
A «x,~1(E) is in the field for all E, the lemma applies and gives the 
result immediately. 

Next we are to prove the almost-everywhere statement. Form the 
extended chain associated with v and P, as described in Section 2 after 
Corvllary 10-7. All statements about this extended chain after time 
n = 0 have the same probabilities as the corresponding statements 
about P, and the vector v of mean times in the various states is nN. 
It is therefore sufficient to show in the extended chain the convergence 
of K(i, x,) for all i. 

Let {#,} be the reverse of this extended chain. Since 


K(i, j) = N al N ais 


it suffices to show for each i that in the reverse process Ñ, , converges 
a.e. as n decreases toa. But Ñ. is the P-potential of a unit charge at i. 
Since the charge has finite support, ? times it must be finite. Therefore 
the theorem follows by applying Lemma 10-16 to the potential 1, 
for the reverse process {#,}. 


By Theorem 10-18 the statement that x, exists (or equivalently that 
x, E S*) and the statement that x, €e Æ, where E is a Borel set in S*, 
are both measurable with respect to the least Borel field containing 
all cylinder sets. But Pr, is defined on all such statements. Hence 


Prix, € E£] 


is defined if Æ is a Borel set in S*. 
From now on, we use the notation of Chapter 2 that F is the field of 
cylinder sets for P and @ is the least Borel field containing F. 
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Corollary 10-19: Pr[x, € S*] = 1. 


Proor: As in the proof of Theorem 10-18, form the extended chain 
associated with P and m. By Theorem 10-18 almost every path in the 
process P (started according to 7) satisfies x, € S*. Hence the same is 
true of the extended chain, and hence it is true of those paths in the 
extended chain which pass through 1. On any path w which passes 
through i, x,(w)eS* if and only if 2,(w,,,,)¢S*. Therefore by 
Definition 10-4, the extended chain watched starting in {i} satisfies 


Pr [a E S*] = pf. 
On the other hand, 
Pr, w[x, €S*] = pf? Prfx, € 8*]. 


Since u{ 4 0, we must have Prax, ¢S*] = 1. 


It is to be emphasized that S* has been constructed for the fixed 
starting distribution m and that Corollary 10-19 is not the same as 
Theorem 10-18 restated for the case where 7 assigns measure one to 
the state i: The boundaries for different starting distributions may be 
different. 


5. Poisson—Martin Representation Theorem 


The notation P, m, K(i, x), S*, F, and of Sections 3 and 4 is still 
in force. We shall use Pr to mean Pr,. 

We recall that 7K(-,j) = 1 for all j in S. If jp —> in S*, then 
K(-,j,) > K(-, x) and, for all n, 7K(-,j,) = 1. Hence 7K(-,x) < 1 
by Fatou’s Theorem. Moreover, we know that K(-,j) is P-super- 
regular for all j in S. If j, > x, then again by Fatou’s Theorem, 


PK(-, x) = P lim K(-,j,) < lim inf PK(-,j,) 
< lim inf K(-,j,) = K(-, x). 


That is, K(-, x) is P-superregular for all xin S*. These remarks enable 
us to prove the following proposition. 


Proposition 10-20: If v is any Borel measure on S* with v(S*) = 1, 
then the function h, defined by 


h, = ie K(i, x)dv(a), 


is finite-valued non-negative superregular and satisfies mh < 1. 
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Proor: It is clearly non-negative and is finite-valued because 
K(i, -) is bounded. By Fubini’s Theorem, 


mh = 2 Í, i m K(i, x)dv(x) = [ , È m K(i, z)| dvin) < |; i dv(x) = 1 


and 
(Ph), = > f _ P K(j, e)dv(a) = f , È PK, z)| dvin) 


< Í K(i, x)dv(x) = h. 
se 


Thus Borel measures on S* give rise to -integrable non-negative 
superregular functions h. Our goal in this section will be to prove 
conversely that every non-negative (finite-valued) superregular func- 
tion h arises as the integral over S* of K(i, x) with respect to some 
measure. We postpone the uniqueness question to Sections 6 and 7. 

Throughout the remainder of this chapter we shall use “‘superregular”’ 
to mean “‘finite-valued superregular.” 

Harmonic measure u is defined on the Borel sets E of S* by 


(E) = Prix, € E]. 


By Theorem 10-18 the definition of u makes sense and p(S*) = 1. 
The complete additivity of u is a consequence of the complete additivity 
of Pr. Thus p is a Borel measure. The proposition to follow gives a 
formula for Pr [x, € E] in terms of harmonic measure. 


Proposition 10-21: For every Borel set E of 9*, 
Pr{z, € E] = f K(i, x)dp(zx). 
E 
Proor: Let E, be a fixed increasing sequence of finite sets of S with 
union ©. Let v,(w) be the last time (possible +00) that an outcome on 


the path w is in #,, and let v,(w) = 0 if no outcome on w is in Æ. 
For any starting distribution y, Proposition 4-28 implies that 


Pr[z,, = j] = > Pr,[x, = j A £m E, after time k] 
k=0 


= > (yP)jeFn 


ll 
“~T 
=. 

eo 
os 

RN 
“by 

3 


Hence 
Priz, = j] = Niefn = Kü, j)N pefr = K(i, j) Prale = J). 
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For Borel sets Æ of S* define measures by 
Hin(Z) a Pr{z,, E E), 
ml(E) = Prix, € E), 
lanl E) = Pr, Ly, E EJ, 
and 
Hz(E) = Pr lx, € E] = (E). 
What we have just shown is that 
pn(B) = | KG, odunla). 


Now if f > 0 is a Borel measurable function on S*, the claim is that 


f Flw)dig(t) = f Fly, (w)) dr). 
S* Q 


The result for characteristic functions is just the definition of uin, and 
for general f > 0 it follows from the result for simple functions by 
monotone convergence. Then the result holds for continuous f > 0 
and hence for all continuous f. Similarly for continuous f, 


ffodus) = f Fæ) Prl), 
where we set x (w) = 0 when v is not defined. 
AS n —> 00, x, (w) > x(w) a.e. [Pr,] by Corollary 10-19. When f is 
continuous, f(x, (w)) > f(x,(w)) a.e. [Pr;]. Since continuous functions 
are bounded, we have 


tim | fento) Prl) = | fie) APr(w) 

by dominated convergence. Hence 

tim | fiædunla) = | flerdu) 
for all continuous f. Similarly 

lim f fleddian(e) = f fled 
Since K(i, -) is continuous, so is f(-)K(i, -). Therefore 

tim | FKG, adum) = | f@KG, edu) 

for all continuous f. Since y,,(#) = fe K(i, x)du,,(z), we obtain 


Í fla)du(c) = f Fla) K(é, a)du,(2) 
se s* 
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for all continuous f. Therefore 
p(B) = Í K(i, xjdu,(£). 
Corollary 10-22: fas K(i, x)dp(x) = 1 for all i eS. 


Proor: Since Pr,[z, € S*] = 1 by Corollary 10-19, the result follows 
from Proposition 10-21. 


The corollary we have just proved is the representation theorem as it 
applies to the column vector which is identically one. We shall be 
able to get the general case by applying the corollary to a suitable 
modification of the h-process introduced in Chapter 8. We now 
re-define the h-process in such a way that we allow h to have some 
entries equal to zero. 


Definition 10-23: If h > 0 is a finite-valued P-superregular function 
such that zh = 1, then h-process is defined to be the Markov chain with 
state space S and with the measure Pr” defined by 


Pr”[zo =c AX, =d Atos NNA By = tA =F] 
= TePoaP ae.» Pishy. 


We readily check that the h-process is indeed a Markov chain. If 
we define S" by 


S = {ies | h, > 0}, 
then the transition matrix P” of the h-process satisfies 
Pizh; 
Ph = h; 
0 for ieS” and jeS — 9 


and the starting vector 7” satisfies 


for i and j in O}, 


n? = mh, for alli. 


If i isin S — S”, then P} is not defined and we shall agree to take it to 
be zero. With this definition we compute directly that the funda- 
mental matrix N” satisfies 


Ii and j are in S* 
Ni, = h; 
ôi; otherwise. 


Hence P" has only transient states. 
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Lemma 10-24: If h > 0 is a P-superregular function and if h; = 0 
and h; > 0, then N, = 0. If, in addition, 7h = 1, then (7"N"), = 
h,(aN),, and (7"N"), > 0 if and only if j isin 8S". For i and j in 8", 


Ki, j) = K(i, j)/hy. 
Proor: If h, = 0 and h, > 0, then for every n 
hi > > PPh, = PPh, 
k 


and hence P{ = 0. Therefore N; = >, Pi? = 0. Consequently, 


Nh; 
(m* N”); = > mh( A s) = h; > mNy = h; 2 mN = hi(rN);. 
i iest teS 


ieSh 


By assumption 7 is a vector such that 7N is strictly positive. Therefore 
(7"N"), = Oif and only if h; = 0. 

Finally, according to the convention at the end of Section 3 and the 
calculation just completed, K"(i, j) is defined if i and j are in S". We 
have 
N ħħ 


KMi,j) = NYIN, = A 


Since the h-process has the property that (7"N"), is positive exactly 
when j is in 8S”, we can, as noted at the end of Section 3, define a metric 
d? on © x S and we can form the Cauchy completion S’* with 


boundary B”. We shall agree to use the same weights in defining d” 
that were used in defining d. 


Lemma 10-25: The identity map from (S*, d”) into (S, d) is an 
isometry. 


Proor: Let j and j’ be in 8”. Then 


dej, j) = > wa N*),|K"(G, j) — KX, 7’)| 
iest 
= > wN phl) Kli, j) — K(i, j’) fh 
ies” 
= > wN u| K(i, j) E K(i, j^) 
ies? 


= > wN,lK(i, j) — Ka, j’) 


ieS 


dj, j’). 
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It follows from Lemma 10-25 that S"* can be canonically identified 
with a compact subset of S*. Thus by continuity, K"(i, x) = K(i, x)/h, 
for all ¢ in S* and x in S"*. Harmonic measure for the h-process will 
be denoted yu”. We can consider it to be defined on S* (as well as on 
S**) if we set 

p*(E) = p(B OS") 
for Borel sets # in S*. 


We are finally in a position to state and prove the existence half of 
the Markov chain analog of the Poisson—Martin Representation 
Theorem. 


Theorem 10-26: If h = 0 is a finite-valued P-superregular function 
such that mh = 1, then 


= f K(i, x)dp"(x). 
s* 
Proor: Applying Corollary 10-22 to the h-process, we have 


Í K”(i, x)dp"(x) = 
si * 


for i in S”. That is, for ¢ in 8” 


h; = f.. K(i, x)dp"(x =f, K(i, xjdp”(x). 


Now if ieg — S", N; = 0 for all je 8 by Lemma 10-24. Thus 
K(i, j) = 0 for such i and j. Since K(i, x) is continuous on S"*, 
K(i, x) = 0 for i e S — S and «eS. Therefore for such i, 


h; = =0= f., K(i, x)du"(x ) = [Ki odro) 


Of course, the representation theorem immediately extends to cover 
all P-superregular functions h = 0 for which zh is positive and finite. 
However, the probabilistic interpretation of the measure p” is lost. 
We shall return to this point in Theorem 10-41 of Section 7 after 
proving the uniqueness theorem. 


6. Extreme points of the boundary 


The measure p” is not necessarily the unique Borel measure which 
represents Å in the sense of Theorem 10-26, and we consequently need 
another hypothesis to get uniqueness. What we shall do in this section 
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is to define the set B, of extreme points of the boundary and the subset 
S = S U B, of S*. In Section 7 we shall see that u” has all its mass on 
S and that u* is the unique measure with all its mass on S for which the 
representation in Theorem 10-26 holds. 

There are three kinds of behavior of points of the boundary that we 
shall want to exclude: 


(1) x has the property that 7K(-, x) < 1. 

(2) x has the property that K(-, x) is not regular. 

(3) x has the property that K(-, x) is regular but is a nontrivial 
convex combination of other non-negative regular functions. 


The first two of these possibilities are the topic of the two lemmas to 
follow. The third possibility will require more of our attention, and 
we discuss it beginning with Definition 10-29 and Lemma 10-30. 

We recall that 7K(-, x) < 1 for all x in S*. 


Lemma 10-27: For almost every x [u] in S*, 7K(-,2) = 1. The set 
where the equality holds is a Borel set. 


Proor: For each i, the function K(i, x) is continuous. Hence the 
countable sum 7K(-, x) is Borel measurable. Therefore the set where 
it equals one is a Borel set. 

By Corollary 10-22, 


Í K(i, x)du(x) = 1. 
s+ 


Thus by Fubini’s Theorem, 
= rl = n f K(-, x)du(x) = f nK(-, xjdu(x). 
S* s* 


But fos ldu(x) = 1 also, and since 1 — mK(-, x) > 0, we conclude that 
mK(-,x) = l a.e. by Corollary 1-40. 


We say that a function h is normalized if 7h = 1. By Lemma 10-27, 
K(-, x) is normalized for a.e. x [u]. 
We recall that K(-, x) is P-superregular for all x e S*. 


Lemma 10-28: For almost every x [u] in the boundary B of S*, the 
function K(-, x) is regular. The set where it is regular is a Borel set. 


Proor: The set where P,.K(-,2) = K(i, x) is a Borel set since it is 
the set where a Borel measurable function takes on the value K(i, x). 
The set where K(-, x) is regular is the countable intersection of these 
sets. 
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By Theorem 4-10 with the random time identically equal to one, we 
see that the column vector whose ith component is (see Proposition 
10-21) 


f K(i, x)du(x) = Prfx, € B] 
B 


is P-regular. By this observation and by Fubini’s Theorem, we have 


Í PK(-, x)dp(x) = P Í K K(-, xjdu(x) = | Ko, x)du(x). 


Since PK(-, x) < K(-, x), we must have PK(-,x) = K(-, x) a.e. by 
Corollary 1-40. 


Definition 10-29: A finite-valued function h > 0 is minimal if 


(1) h is regular, and 
(2) whenever 0 < h’ < h with h’ regular, then h’ = ch. 


Lemma 10-30: A normalized finite-valued regular function h > 0 is 
minimal if and only if it cannot be written as a nontrivial convex 
combination of two distinct normalized non-negative regular functions. 


PRooF: If h = c,h, + coho is such a convex combination, then either 
h, or hg, say hy, is not equal to h. Since h > c,h,, we must have 
c,h, = ch if h is minimal. Multiplying through by ~, we obtain 
c, =c. Since c, 4 0, we conclude h, = h, contradiction. 

Conversely, if h > h’ > 0 with h’ regular and h’ not equal to 0 or h, 
then 

h-hh’ 


h= CAO Taw) mh = 


aay + nth -w 
exhibits h as a nontrivial convex combination of normalized regular 
functions, provided we can prove 0 < wh’ < 1. If so, then by 
hypothesis the two normalized functions must be equal to each other 
and hence equal to h. That is, h’ = (zh')h. Thus we are to prove 
0< 7h’ <1. Let hi > 0. Since (rN); > 0, choose n so that 
(7P"); > 0. Since h’ is superregular, h’ > P"h’ and hence zh’ > 
nP"k' > (7P"),h; > 0. A similar argument applied to h — h’ shows 
that wh’ < 1. 


Definition 10-31: A point x in S* is an extreme point of S* if the 
function K(-, x) is minimal and normalized. The set of extreme points 
is denoted B,. Let S = S U B.. 
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Since K(-,j) is not regular when j eS, no point of S can be in B,, 
and B, must be entirely contained in the boundary B. The set S is 
the subset of S* with respect to which the uniqueness theorem will be 
stated. We shall see eventually that 8 is a Borel set and that y(S) = 1 
(compare with Lemmas 10-27 and 10-28). 

If we form an h-process, we know that S* C S**C §*. The 
following lemma strengthens this conclusion and shows that actually 
S CS. 


Lemma 10-32: Let h > 0 be a finite-valued normalized P-super- 
regular function. If x is in S"*, then 
(1) K"(-, x) is normalized if and only if K(-, x) is normalized. 
(2) K"(-, x) is regular for P” restricted to S” if and only if K(-, x) is 
P-regular. 
(3) K"(-, x) is minimal for P” restricted to S" if and only if K(-, x) 
is minimal for P. 


Hence B? = B» A B, and S* C 8. 


Proor: Conclusions (1) and (2) follow from the identities 
>, Ki, £) = rK(-, 2) 


tes? 

>, Pi KM 5,2) = PK(-,2), 

jes 
both of which use the fact that K(i, x) = 0 if i is not in S* (see the proof 
of Theorem 10-26). 

Thus in (3) we may assume that K"(-, x) and K(-, x) are both regular. 
Multiplying both by the same constant, if necessary, we may assume 
for the purposes of this proof that they are normalized. We shall use 
Lemma 10-30 and show that a nontrivial decomposition exists for 
K(-, x) if and only if a nontrivial decomposition exists for K"(-, x). 
In fact, if for ie S 

K(i, x) = eh + coh 


nontrivially, then for i € 8S", 


; ; AY hD 
K*(i, x) = K(i, x)/hy = c, =~ + eg —. 
h, h; 
We have 
hP 1 1 AW 
Pei = SS ap h= = PhP = t 
2 ij h; 2 h; ijj 2 h; FA h; 
and 
hD 
> ™ = > ah = > mi? = 1, 
ies* i iest ies 
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where the sums over S can be replaced by sums over S because 
ies — S” implies 0 < hi < K(i,x)/c, = 0 or h} = 0. Consequently 
we may assume that 
KD P 
h h 
for ieS”. That is, hf? = h® for ieS. But WP = W4? = 0 for 
ieS — 8", Hence h® = kh”. 
Conversely, if for i € S" 
K” (i, £) = eah? + chi, 
then 
K(i, x) = Cgh{h, + Chih, 
for ieS”. Extend {h{h,} and {hh} to be defined for all ¿eS by 


setting them equal to zero fori eS — 8S". The convex sum of them is 
still K(i, x), and they are both regular normalized functions, since 


2 P hh, = X P hPh, = hh 
1 jes” 
and 


D> nhh = > mhh = h? = 1, 


i ies? 


Consequently we may assume that Mh, = h{Ph, for alli eS. That is, 
h® = h” for all i eS". 


We now begin to derive properties of the set B, of extreme points. 


Lemma 10-33: If h > 0 is a normalized minimal function such that 
h; = K(i, x)dv(x) 
S* 
for a Borel measure v with »(S*) = 1, then v is concentrated at a single 
point and that point is extreme. 


Proor: Consider the functions 


es me i} KE Dae 


as A ranges through the Borel sets with 0 < (A) < 1. For any such 
A, h4 and h4 are superregular and satisfy wh4 < 1 and wh4 < 1 by 
Proposition 10-20. But 


h = v(A)h4 + v(A)hA 
with h regular and normalized and with »(A) + »(A) = 1. Hence h4 
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and h must both be regular and normalized. By Lemma 10-30, 
h = h^ = hë. Thus, if 0 < (A) < 1, 


(A), = Í K(i, x)dv(x) 


for each fixed i. The same is trivially true if (4) = 0, and it is true by 
hypothesis if (A) = 1. Hence it is true of all Borel sets. Therefore 
for fixed i, 

K(i, x) = h, ae. [v]. 
Thus K(i, x) = h, for all i almost everywhere [v]. Since v(S*) > 0, 
there is at least one point x) where it is true. We have 


K(i, Xo) = h; 
for all ¿. If there were another such point x’, then we would have 
K(-, £o) = K(-, 2’), and hence 2) = x’ by Proposition 10-14. There- 
fore the complement of {x,} has measure zero, or v is concentrated at 


zo Now, we know that K(-,2)) =h and h is normalized and 
minimal. Hence x, is extreme. 


Lemma 10-34: If h, h“, and h® are normalized non-negative super- 
regular functions with 
h = ch + coh, 


where c, > 0 and co > 0, then 


n(2) 


pè = eph cop 


lI 
= 


Proor: For a typical basic statement zo =i A £% =j AX, 
we have, by Definition 10-23, 


Pr"[a9 = i A Tı =j A Xo = k] = mP yh ihg 


Using the analogous identities for h® and A? and breaking up hę as 
C,h® + coh, we obtain 


Prag =i Aa, =j A £a = k] = c Pe lra =i Aa =j A Ly = k] 
+ Cg PrP [zo = i A ty =f A t= k]. 


Hence the same is true of all statements in F, and by the uniqueness 
half of Theorem 1-19 we find that 


Pr'{p] = c, Pr*'[p] + ca Pr*[p] 


for all statements p measurable relative to Y. 
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Take p to be the statement x, € E, where E is a Borel set of S*. The 
claim is that x, € E if and only if x? € E, and similarly for h‘ and h®. 
For each n we certainly have x, = 2%. Hence x, = xt when v is 
finite. When v is infinite, we have x, = x? because x, = 2x" for all n 
and because S"* is isometric with a subset of S*. Therefore 


Pr'[z, € E] = c, Pr’ [x, € E] + c3 Pr" [x € E] 


or 


na hD 


pè = cu” + Cop 
Proposition 10-35: Let h = K(-, x). Then veð if and only if h is 
normalized and p”({x}) = 1. 


Proor: First let xe S. Then h is certainly normalized, and hence 
the h-process is defined. Suppose we can prove that the h-process 
disappears with probability one. Then by definition of p”, u”(S") = 1. 
Hence by Theorem 10-26, 


hires 
K(-,2) = K K(-, y)dp"(y) = Í K(-, y)dp"(y) Z > Ny (G2). 
But K(-, 2) = Niz/Nax = 2; Nij(3j;2/N,;). By Theorem 8-4, 


Siz _ eG) 
N i 


That is, u*({x}) = 1. Thus we are to prove that the h-process dis- 
appears with probability one. By the remarks following Definition 
8-13, it suffices to show that X es+ N} f; = 1 for some f. Take f to be a 
single mass 1/(NV,,h,) at x, and the equality follows. 

Next let xe B,. Then his normalized by definition. By Theorem 
10-26, 


for all j. 


nj nj 


Koade ie K(-,y)du(y). 


In Lemma 10-33 take v to be u”. Then p”({xo}) = 1 for some 2p. 
But then K(-, x) = K(-, xo), and hence x = x, by Proposition 10-14. 

Conversely, suppose / is normalized and pw"({x}) = 1. Ifa2¢S, then 
x € B, and by Lemmas 10-28 and 10-32 (conclusion 2), A must be regular. 
It remains to show that h is minimal. Ifh = c,h + coh®, then by 
Lemma 10-34 


p” = oe? + Cop 


RD 


Hence u”™ must put all its weight on x, and therefore h = h®. By 
Lemma 10-30 we conclude that A is minimal. 
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7. Uniqueness of the representation 


In this section we retain all of the notations of Sections 3 through 6. 
We are going to prove that y” is the unique Borel measure concentrated 
on § for which the representation in Theorem 10-26 holds. The first 
step will be to show that S is a Borel set of harmonic measure one. 

We denote by Sy the set of points x in S* for which K(-, x) is normal- 
ized. These are exactly the points for which the h-process with 
h = K(-, x) has been defined. By Lemma 10-27, Sy is a Borel set of 
harmonic measure one. Since Sy is a Borel subset of S*, the notion of 
a Borel measurable function on Sy is well defined. 


Lemma 10-36: If A is a fixed -measurable set in 2, then Pr*“-*[A] 
is a Borel measurable function of x in Sy. 


Proor: By Definition 10-23, 
PrkO- a, = i A Tı =j A Xo = k] = mP PiK (k, x), 


and the right side is continuous even for x in S*. Since any cylinder 
set is the countable disjoint union of basic cylinder sets, the function 
Prw e A] for A e F is a denumerable sum of such functions and 
hence is Borel measurable. 

Let @ be the class of all sets in 2 for which Pr*“-*[w e A] is Borel 
measurable. If {A,} and {B,} are, respectively, increasing and 
decreasing sequences of such sets, then 


Prké¢2) A,] = lim Pré-™[A,] 
and 
PrkCOre) B,] = lim Pré°1B,], 


the latter equality holding since Pr*“:* is a finite measure. Hence 
U A, and N B, are both in €. By the Monotone Class Lemma (see 
Halmos [1950], pp. 27-28), @ contains Y. 


Lemma 10-37: If A is in Y and if C is a Borel set in Sy, then 


Prilwe A A xv, EC] = f PrO A]dp(x). 
C 


Proor: If p is the typical basic statement 2) = i A x1 =j A £t = k, 
then 


Pr[p A 2, €C] = Pr[p]: Pr[x, €C | p] 
= m;PuP ir Prie, €C] 
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by Theorem 4-9. By Proposition 10-21 and by Definition 10-23, this 
expression is equal to 


mPyPn | K(k, adala) 


[ Punk b, zidul) 


= f Prk p]dyu(z). 
c 


By Lemma 10-36 the function Pr*“:*»[A] is a Borel measurable function 
ofxifAisinY. Fixing C, we may therefore define a set function o by 


o(d) = Í PrO A]du(x). 


Then o is certainly non-negative, and it is completely additive by the 
Monotone Convergence Theorem. The calculation above shows that 
the two set functions o(A) and Prlwe A A x,€C] agree on basic 
cylinder sets and hence on all of the field F. By Theorem 1-19 they 
must agree on all of G. 


Proposition 10-38: The set S is a Borel set with (5) = 1. 


Proor: Applying Lemma 10-37 to the statement x, € D, where D is 
a Borel set of S*, we have for any Borel subset C of Sy 


Pr[z,é D A 2, EC] = f PrkC ly, e D]du(x) = f pŽECO(D)du(x). 
C c 
But by definition 
Prize D A x2, EC] = w(DAC) = f xp(x)du(zx). 
G 


The set on which two Borel measurable functions agree is a Borel set. 
Hence for fixed D the set of x’s on which 


p#C®(D) = xp(a) 


holds is a Borel set in Sy. Since these two functions have the same 
p-integral over all Borel sets C, they must be equal a.e. [u]. 

We shall let D range over the intersection with Sy of all balls with 
centers in S and with rational radii. Let 7 be the collection of such 
balls and let 


T = {x E€ Sy | p®°™(D) = xp(x) for all D in F}. 
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Remembering that Sy is a Borel set of harmonic measure one, we see 
that T is the denumerable intersection of Borel sets of measure one and 
is therefore á Borel set with u(T) = 1. We show T = 8. 

First let xe T. Choose s, in S with d(x, s,) < 1/n, and let D, be 
the intersection with Sy of the ball with center s, and radius I/n. By 
assumption 


pECYD,) = Xp, (2) = 1 
for all n. Hence 
BECP fa}) = ECAN Da) = 1. 


Therefore x € 8 by Proposition 10-35. 
Conversely, if x eS, then 


pkOD (fal) = 1 
by Proposition 10-35, and so 
pkO2(D) = xola) 
for all Borel sets D in Sy. Therefore xe T, and T = 8. 
Lemma 10-39: Let h be defined by 
ha f K(i, 2)dv(2), 
8 


where v is a measure with (8) = 1. Then the h-process is well defined, 
and for any A in 9 


Pr’[A] = f Prk¢.2 4 |dv(x). 
5 
Proor: By Proposition 10-20, k is non-negative superregular. By 


Fubini’s Theorem rh = 1, since 7K(-, x) = 1 forall xin. Hence the 
h-process is well defined. If p is a typical basic statement, 


to =i AX =j A T= k, 


Pip] = mPyP iche 


= f mP PikK(k, x)dv(x) 
5 


= f Prk 2 p]dv(x). 
3 
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Proceeding as in Lemma 10-37, we define o by 
o(A) = i Pro A]dv(x). 
5 


Then v is a finite measure on Y which agrees with the measure Pr’[A] 
on F. By Theorem 1-19, 

Pr’{A] = o(A) 
for all A in Y. 


The theorem to follow is the uniqueness theorem mentioned at the 
beginning of this section. 


Theorem 10-40: If h > 0 is a normalized superregular function such 
that 


h, = Í K(i, x)dv(x) 
5 
for some measure v on S* with »(S) = v(S*) = 1, then v = p’. 


Proor: By Lemma 10-39, 
pC a8) = Priz, eC A 8] = f PrE Olz, EC NA S]dv(2). 
But by Proposition 10-35, i 
PrE COx, EC AS] = pC ANA 8) = yonl). 
Hence for all Borel sets C N 8 


uh(0 08) = Í xens(a)dv(x) = (C08). 


Since »(S) = 1, we must have u”(§) = 1 and hence p*(C) = v(C) for 
all Borel sets C in S*. 


We have not yet proved that the representation in Theorem 10-40 
does hold for the measure u”, but this fact is a consequence of the 
theorem to follow, which will summarize the results of the past three 
sections. 


Theorem 10-41: The 7z-integrable non-negative P-superregular 
functions h stand in one-one correspondence with the non-negative 
finite measures v on the Borel sets of S, the correspondence of h to v 
being 


h, = Í K(i, x)dv(x) 


and satisfying 7h = v(8). The unique representation h = Nf + r with 
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r regular arises by decomposing the integral over S into a part over S 
and a part over B,. If h is normalized, then the measure that corre- 
sponds to h is p”. 


Proor: By Fubini’s Theorem any function of the form 


h, = f xe. x)dv(x) 


with v finite is non-negative superregular and has mh = »(S). 
Conversely, let h be given. If mh = 0, then the superregularity of 
h implies that 
0 = 7h = 7P"h = 0 


for all n and hence 7P"h = 0 for all n. Thus (xV)h = 0. Since nN 
is everywhere positive, h = 0. Thus existence and uniqueness of v 
follow if wh = 0. Next, let mh be positive. Since h and v must be 
related by mh = v(S), we may, for both existence and uniqueness, divide 
h by an appropriate positive constant to obtain zh = 1. Uniqueness 
of v and the fact that v = u” then follow from Theorem 10-40. Exist- 
ence of v follows from Theorem 10-26 provided we can show that 
pr(s* — 8) = 0. By Lemma 10-32, 
Sc SA 8. 
Hence 
wr(S) = pr(S** 08) = p). 

But the right side equals one by Proposition 10-38. Thus u*(S) = 1 
and p(S* — 8) = 0. 

Finally we have S = S U B, disjointly, and an application of 
Fubini’s Theorem shows that 


K(i, x)dv 
Be 
is regular. Since 
K(i,x)\dv= > N (2), 
Í, ( ) v 2 ij Nui 


the representation h = Nf + r is as asserted. 


8. Analog of Fatou’s Theorem 


In the classical case of the disk, which was discussed in Section 1, 
normalized Lebesgue measure m on the circle has the distinguishing 
property that it corresponds to the function 1. If h is the non- 
negative harmonic function corresponding to a measure v and if 
v = fm + p, is the Lebesgue decomposition of v with respect to m, 
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then Fatou’s Theorem asserts that for almost every x [m] on the 
circle, h(re’) — f(x) whenever re! converges to x nontangentially. In 
this section we shall prove a Markov chain analog of this theorem in 
terms of the Martin boundary and the measures given by Theorem 
10-41. As expected, harmonic measure u will play the role of Lebesgue 
measure. 

Our procedure will be first to derive an almost everywhere statement 
in terms of the measure on the probability space and then to translate 
this statement into a result in terms of harmonic measure. As a 
preliminary to the first step, we consider a special case in the lemma 
below. 

We shall be dealing with expressions of the form lim,.,,, h(x,(w)) in 
this section, where A is non-negative and P-regular, and we shall adopt 
the convention that h(z,(w)) = 0 if n > v. This definition is moti- 
vated by the following consideration: If P is the enlarged chain for P, 
then a P-regular function h extends to be regular for P if h is defined 
to be zero at the absorbing state; consequently if rh is finite, {h(x,)} is a 
martingale with M[h(x9)] = wh. 


Lemma 10-42: If h = 0 is a normalized bounded regular function, 
then p” is absolutely continuous with respect to u and 


u = | Kë afeodute) = | Kë, )fle)dula) 


where f is the Radon-Nikodym derivative of u” with respect to u. 
The function f may be taken to be zero on S, and if it is, then 


Pr[ lim A(z,) = f(x,)] = 1. 


Proor: Since his bounded, h < c1 for some constant c > 1. Noting 
that 


c— 1 1 1 
1 = lie - »)| +h, 
set g = (c — 1)"1(c1 — h). Then g and h are non-negative normalized 
superregular functions, and Lemma 10-34 shows that 


c—1 1 
et oi 


H= 


Thus u” < cu and p” is absolutely continuous with respect to u. By 
the Radon-Nikodym Theorem there is a Borel function f such that 


WNC) = f feed 
Cc 
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for all Borel sets C. Since h is regular, u*(S) = 0 and we may take f 
to vanish on 8. 
By Theorem 10-41 we have 


ie f K(i, xz)dp* (2) 
= f xe x) f (x)dpu(x) 


- I. K(i, 2) f(@)dyu(c). 


Now the argument in the proof of Proposition 10-21 shows that 


[KG efodu) = | Fæst) Pro). 
Thus if Pr,{%) =j A---A a, = 1%] > 0, 

Mf (£) | £o =Jj Ars A % = t= Mif(z,)] = h, 
Similarly, if b denotes the absorbing state in the enlarged chain, then 
MI f(x») | £o =Jj Att A & = b]) =0= hy. 

That is, if Z, is the partition generated by {z,..., £p}, then 
ML f(y) | EA = h(x). 


On one hand, the Borel field generated by the &, is Y, and on the other 
hand f(x,) is measurable over Y, since it is the composition of a Borel 
function and a function for which the inverse image of every Borel 
set isin Y. By Proposition 3-18, 


Pr[ lim A(z,) = f(z,)] = 1. 


The general case of the almost everywhere statement in terms of the 
measure on the probability space is covered by the following theorem. 


Theorem 10-43: Let h > 0 be a normalized regular function and let 
u” = fu + us be the Lebesgue decomposition of u” with respect to p 
(where f is taken to be zero on the state space S); then 


Pr[lim h(z,) = f(z,)] = 1. 


Proor: If we let k = 31 + 4h, then k is a non-negative normalized 
superregular function. Since k is strictly positive, we have S* = S 
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and S** = S*. By Lemma 10-32, Bk = B,. The function h/k is a 
bounded regular function for the k-process with 


h 
ni(z!) = h = 1, 
> i E, 2, Tilli 
and hence Lemma 10-42 yields a function g with 


G), = f xe x)g(x)du*(x) 


and 


Pr* [lim ae = ge] =1. 


As was pointed out in the proof of Lemma 10-34, x% is identical with x, 


and also 
Pr'[p] = $ Pr'[p] + 3 Pr’[p] 


for all p. Thus Pr*[p] cannot be one unless Pr'(p) and Pr”[p] both 
equal one, and we conclude that 


or 


Since {h(x,)} is a non-negative martingale, lim A(x„) exists a.e. [Pr] and 
is finite. Therefore the above identity implies that 


_ _ 9%) | 
Pr[ lim A(x,) = ae | = 1 
Thus to complete the proof, it suffices to prove that 


g(x) 
f(x) = 2 — g(x) 


First we identify g as the Radon—Nikodym derivative of u” with 
respect to u”. On one hand, we have 


Pei f K*(i, xyg(æ)dp* (æ) 


a.e. [u]. 


[ KG, agod). 
On the other hand, Theorem 10-41 gives 


TE Í K(i, «)du(e), 
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and thus the uniqueness part of Theorem 10-41 gives p? = gp“. 
Now, by Lemma 10-34, we have u! = łu + du”. Hence 


Se + ps = p” = gu" = bgu + bgu” = bgu + tofu + dou, 
or 


(2f — g — fou = (g — 2)us- 


Since u and u, are singular with respect to each other, each side is the 
zero measure. For the left side, this statement means that 


2f — g — fg = 0 a.e. [u] 
or 


f= 5 2 ; a.e. [u]. 


The corollary to this theorem is the analog of Fatou’s Theorem; it is 
a translation of the theorem into a result in terms of harmonic measure. 
The statement of the corollary needs a way of singling out for attention 
a single point x of S*. One way of proceeding is to use the K(-, x) 
process, at least if x is in 8; for in that case Proposition 10-35 shows 
that 
PrO (w) = x] = 1. 


Corollary 10-44: Let h > 0 be a normalized regular function, and let 
u” = fu + m, (with f equal to zero on S) be the Lebesgue decomposition 
of p” with respect to u. Then for almost every x [u] for which K(., x) 
is normalized, 

Prk lim A(x) = f(x)] = 1. 


Proor: By Theorem 10-43, 
Prilim A(x) = f(x,)] = 1. 
Then by Lemma 10-39, 
1 = Prflim A(x) = f(e,)] = | Pe€-flim Mæn) = fled). 


Since 
Pré flim h(x,) = f(x,)] < 1 


for every x, equality must hold for almost every x [u]. But 


PrkOPMy, = a] = 1 
or 


Pred f(e) = f()] = 1 


for almost all x in S. Since y(S) = 1, the corollary follows. 
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The results above clearly extend to all z-integrable regular h > 0 
provided we replace u” everywhere by the unique measure v associated 
to h by Theorem 10-41. 

Since the function f in all three of the above results is equal to zero 
on S, we may think of f as a function on B,. If f is so restricted, it is 
called the fine boundary function of h. 


9. Fine boundary functions 


The results of Section 8 may be extended to non-negative 7-integrable 
superregular functions with the help of the proposition below. Our 
convention that h(x,(w)) = Oif m > v(w) is still in force. 


Proposition 10-45: If g is a 7-integrable function of the form Nf with 
f = 0, then 
Pr[ lim g(x,) = 0] = 1. 
n> oO 


For almost every x [u] for which K(-, x) is normalized, 


Prk lim g(x,) = 0] = 1. 
n= œ 


Proor: In the first statement, g is non-negative superregular and zg 
is finite; hence {g(x,)} is a non-negative supermartingale and g(x,) > 
z > 0a.e. [Pr]. Nowg = P"g forall n, and P"g — 0. By dominated 
convergence 7P"g-> 0. Thus 


M,[z] < lim M,[g(x,)] = lim (7P"g) = 0 


and z = 0 a.e. [Pr]. The proof of the second statement in the proposi- 
tion is the same as the proof of Corollary 10-44. 


Thus if h > 0 is 7-integrable and superregular, we may write, accord- 
ing to Theorem 5-10, h = Nf + r, with r regular and Nf and r both 
-integrable and non-negative. Corollary 10-44 and Proposition 
10-45 may therefore be combined into a single result whenever 
necessary. 

The fine boundary function f of a normalized minimal regular func- 
tion h takes on an especially simple form. By Lemma 10-33, we must 
have h = K(-, y) for some y in B,, and, by Proposition 10-35, 


p(y) = 1. 


There are two cases. First, if u({y}) = 0, then u” is singular with 
respect to u and the fine boundary function of h is zero. By Lemma 
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10-42, h is unbounded. Second, if u({y}) = a > 0, then the fine 
boundary function may be taken as l/a at y and 0 elsewhere, since 


HNC) = xoly) = | feeddute) 
for all Borel sets C. Moreover, h is bounded by 1/a since 
1 > Pri{z,(w) = y] = f K(i, 2)du(2) = aK(i, y) = ah, 
w) 


The class of -integrable regular functions h > 0 with a given 
p-integrable non-negative function f as fine boundary function is 
exactly the class of functions 4 for which 


v = fp + bs 
is the Lebesgue decomposition with respect to u of the measure v 


associated to h. Thus the class of such functions h is exactly the class 
of functions of the form 


u= [KGa fladute) + | KG, adut), 


where p, is any Borel measure singular with respect to u. In this class 
there is a unique smallest such function 


hy = | KG, afdata). 


On one hand, Theorem 10-43 gives lim A(x„) = f(x) a.e. [Pr], and 
hence 


M,im Men)] = Mafe] = f Fedu) = a. 
On the other hand, i 
lim M,[A(x,)] = lim rP"h = mh. 
If h; > h; for some j, choose n so that (7P"), > 0. Then 
nh = Ph > Ph = mh. 
Thus by Proposition 1-52, {h(x,)} is uniformly integrable if and only if 
h=h. 

There is a different topology for S* which is occasionally referred to 
in connection with fine boundary functions. The fine topology for S* 
is defined in terms of its neighborhood system as follows: For any 
point x in S* — B, every set in S* containing x is to be a neighborhood 
ofx. For xin B,, the neighborhoods of x are the sets A in S* such that 


x is in A and 
Prz, E€ A from some time on] = 1. 
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Evidently x is in each of its neighborhoods, the intersection of two 
neighborhoods is a neighborhood, and a superset of a neighborhood is a 
neighborhood. We must check that S* is a neighborhood of x; that is, 
that the K(-, x)-process disappears with probability zero. But this 
fact is a consequence of Proposition 10-35. 

If x is in B, and if A is an open set containing x in the metric 
topology of S*, then 


1 


IV 


Prox, E A from some time on] 


Prk6ofe, € AJ = p¥CA) = pEO2(Ge}) = 1. 


IV 


Therefore A is open in the fine topology, and the fine topology is 
stronger than the metric topology. 

The next lemma and proposition show that a zero-one law holds for 
the probabilities which define the fine topology. The lemma by itself 
is of value in checking whether a non-negative regular function is 
minimal. 


Lemma 10-46: A non-negative normalized regular function h is 
minimal for P if and only if the only bounded regular functions for P” 
(restricted to S”) are constants. 


Proor: If hb is minimal, then the regularity of h implies that 1 is 
regular for P” restricted to S". By adding a suitable multiple of 1 to a 
given bounded regular function for P”, we may assume that it is 
a non-negative bounded regular function for P”. Thus let h with 


0 < h < c1 bea regular function for P” restricted to S”. Set 
, hh, if ie S" 
t lo otherwise. 


Then k is P-regular and satisfies 0 < k; < ch; Since h is minimal, 
k; = c'h, for all i and hence h; = c’ for i in S". That is, h = c’1. 

Conversely, if h is not minimal, find a regular function k with 
0< kgz<handk# ch. Set 


h, = kih; for ie S*. 
Then % is a non-constant regular function for P” restricted to S". 
Proposition 10-47: If x is in B,, then, for any subset A of S, 


PrëCO[x, E A from some time on] = 0 or 1. 
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Proor: The h-process for h = K(-, x) disappears with probability 
zero, and thus 
Prz, ES] = 1. 
Therefore 


Prk¢- ze A from some time on] 
= Prz, eS — A only finitely often] 
= 1 — Pr*¢- x, eS — A infinitely often]. 


By Lemma 10-46, the only bounded regular functions for this process 
are the constants, and thus, by Proposition 5-19, this last expression 
must equal zero or one. 


Proposition 10-47 shows that fine neighborhoods of x in B, are those 
sets A in S* for which x is in A and 


PrO, E A from some time on] > 0. 


The complement of a fine neighborhood of x is called thin at x. Such 
sets A are characterized by the property 


Prz, E€ A infinitely often] = 0. 


By Proposition 10-47 this probability must again be zero or one. 

Let h = 0 be a z-integrable regular function with fine boundary 
function f. The fine topology for S* has the property that the function 
h Uf obtained by extending h to S* by f is continuous at almost every 
[u] point x in S*. In fact, the statement is trivial for x not in B,, and 
it therefore suffices by Corollary 10-44 to prove the result for every 
x in B, for which 

PrECoNim A(x,) = f(x)] = 1. 
Let a < f(x) < b. We are to produce a fine neighborhood of x such 


that 
a < (hU fy) < 6 


for all y in that neighborhood. Let A be the set of points of S for which 
a < h < band form the set A U {x}. We shall show this is a neighbor- 
hood of x. The convergence of h(x,) to f(x) implies that 


PrECOfa < h(x,) < b from some time on] = 1 
or 
Prz, E A from some time on] = 1. 
Therefore 
Prox, E A U {x} from some time on] = 1 


and A U {x} is a fine neighborhood of z. 
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10. Martin entrance boundary 


The Martin exit boundary was defined in terms of P and a vector 
m > Osuch that 7N is strictly positive. The completion S* of S in the 
metric d served for a representation of all 7-integrable superregular 
functions h > 0 in terms of finite measures on this space. 

In this section we shall introduce a different completion *S of S 
which will serve for the representation of P-superregular measures. 
As the analog of 7, we fix once and for all a function f > 0 such that 
Nf is everywhere positive and finite-valued. The representation 
theorem will be for superregular measures o > 0 for which of is finite. 

The formalism is as follows: For i and j in S we define 

de A Se. Ng. 
ENN, 
Then the measure J (i, -) is P-regular everywhere except at i, where it 
is strictly superregular, and it satisfies J (i, -)f = 1 for alli. For each 
j the function is bounded by N,/(Nf); because, if g is defined as Nf, 
then 
Jaj = Xa a Na < Ni Na, 
9 9; J; 9; 


d'(j,j') = 2 wN PIG, i) — IG D, 


where the w; are positive weights such that X wN, is finite. The 
bound we have just computed shows that d’ is finite-valued, and d’ is a 
metric if we can show that d’(j,j’) = 0 implies j =)’. But if 
d'(j,j’) = 0, then 

J(j, )= J(j’; -). 


Multiplying through by P and supposing that j + j’, we obtain 
JG j) = DIG DP = DIG Py < IGT)» 
i i 


the strict inequality holding since J(j’, -) is not regular at j’. Thus 
j =j and d' isa metric. 

We define *S to be the Cauchy completion of S in the metric d’; the 
set *9 — Sis the Martin entrance boundary. A sequence {j,} is Cauchy 
in S if and only if the sequence {J (jn, 7)} is Cauchy for every i. Con- 
sequently J(-,i) extends to a continuous function on *S. Then 
{x,} is Cauchy in *S if and only if {J (x, %)} is Cauchy for every 1. 
Two points x and y are equal if and only if J(x, -) = J(y, -), and the 
space *S is compact. 

A P-superregular measure o is normalized if of = 1. It is minimal if it 
is regular and if 0 < 6 < o with 6 regular implies ¢ = co. Application 


We define 
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of Fatou’s Theorem shows that J(x, -) is a P-superregular measure 
for each x and that it satisfies J (x, -)f < 1. A point x of *S is extreme 
if J(x, -) is minimal and normalized, and the set of extreme points is 
denoted B°. Then B° OS is empty. 

The next theorem is the Markov chain analog for the Poisson- 
Martin Representation Theorem for P-superregular measures. 


Theorem 10-48: The sets S and B° are Borel subsets of *S. The 
non-negative P-superregular measures o with of finite stand in one-one 
correspondence with the non-negative finite measures v on the Borel 
sets of S U B°, the correspondence of o to v being 


oO; = Í J (x, i)dv(x) 
SUB? 


and satisfying of = v(S U B°). The unique representation o = 
yN + p with p regular arises by decomposing the integral over S U Be 
into a part over S and a part over B°. 


The proof will be accomplished by using duality, but we shall isolate 
several of the steps beforehand. Let « > 0 be a superregular measure 
such that «f = 1. Duality in the remainder of this section will mean 
a-duality. Let Ê = dual P and # = dualf. Since 

0 < dualg = dual (Nf) = 7N 
and 
1 = af = (dual f)(dual «) = 71, 
the exit boundary of P relative to # is defined. Let d be the defining 
metric of the exit boundary, and let §* be the completion of S under 
We have 
Nj. Ñ; 


: J(j, - fn, Oe 


Lemma 10-49: If the same weights w, are used in defining d and d’, 
then the identity map on S extends to an isometry of (*S,d’) onto 
(S*, d). 

Proor: We have 

L'S) = > wN ALIG dD — IG, | 
J(j, 4 j’, i 
= 2 wil Nf) n i ae 


a; 


= > w (#0) |KO, 9) — RG j’) 
= å(j, j’). 
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Under the identification of *S and §*, 
J(x, -) = dual R(-, x) 
by the continuity of the functions J(-, i) and R(i, -). Under duality a 
P-superregular function h corresponds to a P-superregular row vector 
o =dualh. Since Ê = dual P and 7% = dualf, regularity and 


normalization are both preserved. But minimality is also preserved 
since duality preserves inequalities. Therefore Be = B,. 


PROOF OF THEOREM 10-48: Let h = dual o, and apply Theorem 10-41. 
Then 


h, = f _ R(i, xjdv(x) 
SUB, 
with th = v(S U B.). Therefore 


eee Í 1 REING 
SUB, % 
and of = v(5 U B,). If we identify B° and Ê, and call v by the same 
name on B° as on B&,, then we have 


o; = Í J (x, t)dv(x) 
SUBS 
as required. Conversely, if o is given by 


o = f J (x, ijdv(x), 
SUBS 
then 


(dual o); = ip F R (i, xjdv(x). 


Therefore dual o is P-superregular and o must be superregular. Since 
there is only one measure which represents dual o, there can be only 
one measure which represents o. The last statement of the theorem 
follows from the fact that the unique decomposition o = yN + p is 
transformed under duality into the unique decomposition of dual o 
in the P-process. 

If o > 0, we can take «œ = o and then the proof of Theorem 10-48 
gives us the following interpretation for v :v is harmonic measure for 
the chain with transition matrix o-dual P and starting distribution 
o-dual f. 


11. Application to extended chains 


In order to assign a probabilistic interpretation to the entrance 
boundary, we must pass to extended chains. Except where noted, 
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we therefore assume throughout this section that {z,} is an extended 
chain with state space S U {a} U {b}, with (enlarged) transition matrix 
P, and with vector v of mean times in the various states of S. Let 7 
and f be row and column vectors, respectively, such that m >0 
f20,71=1,7f=1,7N > 0, and Nf > 0. By Theorem 8-3, Nf is 
finite-valued. We continue to use the notations of Sections 3 through 
10 for boundary theory of P relative to m and f, and we shall use the 
vectors u? and v® defined for extended chains in Section 2. 

Our first pair of propositions will give the “final” and “starting” 
distributions for the extended chain. If is in *S and if E ranges over 
finite sets in S, we define 


p(x) = lim J(a, -)e*. 
Ets 
Similarly, if y is in S*, we define 
(y) = lim p*K(-, y). 
Ets 
We first show that these limits always exist. 
Lemma 10-50: For every x in *S and y in S*, both p(x) and q’(y) 
exist, possibly being infinite. Their defining limits are increasing 


limits as Æ + S. Each is a Borel measurable function on its domain, 
and the functions satisfy 


p(x) = lim lim 


hE 
Ets jox (Wf, 
and 


x = lim li . 
vy) ae EN GN); 


Proor: For any finite set F, 
hE == >. N jee 


keE 
and 


M o ek 
Ny, = aT O PE 


Since Æ is finite, we may m to the limit as j — x to get 


lim —~ = J (g, -)e® 

im gp = J 
For each fixed j, the expression h?/(Nf), increases with Æ, and hence so 
does the limit as j — x. Therefore p(x) exists, is defined by an increas- 
ing limit, and has the asserted value. If we choose an increasing 
sequence of finite sets #, with union S, then p(x) is exhibited as the 
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limit of a sequence of continuous functions and is therefore Borel 
measurable. 
Similarly we have 


vi = > LEN, ki 
keE 
and hence 


vË 
lim —Ż- = pF K(-, y). 
GN (y) 
Thus the same argument applies to g’(y), since v? increases with Æ. 


Proposition 10-51: On almost every path, 


2,(w) = lim z,(w) 
ntv 


exists in S*. On almost every path for which the limit exists, either 
v < +œ and 7%, ES or else v = +œ and z, € Be. Moreover, for any 
Borel set C in S*, 


Pr[z, €C] = f, (vdala). 


Proor: The process watched starting in # is a Markov chain with 
transition matrix P and starting distribution p” if Æ is a finite set in S. 
By Proposition 10-21, z, exists and has the required values for almost 
every path in this chain, and in this chain 

Pr,#[2 €C] = > wy Priz, eC] 
ieE 


Sof | KG, adul) 


ieE 
= l pEK( Py x)jdu(x). 


As E increases to S, we obtain almost all paths of {z,} in this way. 
Hence z, exists and is in S U B, a.e., and, by Definition 10-4, 


Pr{z, €C] = lim | w*K(-, w)dp(y). 
EtS Jc 


By Lemma 10-50, the integrand increases with Æ. Replacing the sets 
E by an increasing sequence E, and applying monotone convergence, 
we obtain 


Prz €C] = flim w®K(-, w)d(y) 
CEts 


= i @"(y)duly). 
Cc 
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Proposition 10-52: On almost every path, 


2y(w) S m Zn(w) 


exists in *S. On almost every path for which the limit exists, either 
u > —oo and z, ES or else u = —œ and z, € B®. Moreover, for any 
Borel set C in *8, 


Prfz, eC] = | playdwr(, 
c 
where u” is the unique measure on S U B° representing v. 


Proor: Form the reverse process and call its measure Pr’. Its 
transition matrix restricted to the states of S is the v-dual of P, denoted 
P. Form the exit and entrance boundaries of Ê relative to è = v-dual f 
and f = v-dual m. By Proposition 10-51, the limit 

z = lim z, 
ntv 
exists in §* a.e. [Pr’], and either v is finite and z, € S or else v is infinite 
and a.e. z, € B,. Therefore 
z, = lim z, 
niu 
exists in §* a.e. [Pr], and either u is finite and z, € S or else u is infinite 
and a.e. z, € B,. Since S* and *8 are canonically identified according 
to Lemma 10-49, the first part of the proposition follows. Moreover, 
we have 


Pr[z, €C] = Pr’[z, EC] = Í lim [KER (. , x)]du"(x), 
CEIS 


where u” is the measure on §* which represents 1, that is, the measure 
on *S which represents v. The above expression is 


= f lim [J (x, -)(dual 1“) ]du”(x). 
CEIS 


But už, by Proposition 10-8, is the balayage charge of v on E and its 
dual is thus the balayage charge of 1 on Æ; that is, the charge e” of h*. 


We thus obtain Í plxjdp’ (x) and fa’(y)du(y) as starting and final 
distributions for the extended chain. The fact that v is normalized 
(vf = 1) implies that u” is a probability measure. Note that p(x) 
depends only on P, m, and f and not on the extended chain or even the 
vector v, but that g’(y) depends on P, z, f, and v. We shall return to 
this point shortly. 
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Conversely, if we select any probability measure u° on S U B°, then 
Theorem 10-48 yields a unique normalized P-superregular measure o 
represented by u”. If ois positive, Theorem 10-9 assures the existence 
of at least one extended chain with o as its vector of mean times. Any 
such chain starts in C with probability 


f plæ)du* (2). 
c 


By varying u’ we may change the total measure on {z,}, even as to 
whether it is finite or infinite. 
We consider two special cases. First, if 0 is in S, let 


a; = J(0,9) = No;/(NF)o- 
By the uniqueness of the representation we must have .7({0}) = 1, and 
hence we may represent o (provided it is positive) by an extended chain 


which has a.e. path starting at 0. The total measure of the process 
must be 


J, pænere) = POWE = p0, 


and so the interpretation of o; as the mean number of times in state j 
shows that 


o; = p(0)No;. 
As a check, we can compute p(0) directly. We have 
E 1 
p(0) = lim 8 


eis fo fio 
as required. 

Second, choose o = J(x, -), where x is in B°. Then p7({z}) = 1, 
and (if o is positive) an extended chain representing o in the sense of 
Theorem 10-9 must start a.e. at x with total measure p(x). Therefore, 
J(x, -) may be interpreted as the vector of mean times for any extended 
chain which is started almost surely at x, and, if p(x) is finite, then 


J (x, -) 
p(z) 
is the conditional mean of the number of times in the various states 
given that the process starts at x. 


Returning to the case of a general v and supposing that p(x) is finite 
a.e. [u"], we see that the identity 


= so ply dw (x 
Vj f p(x) Pl ) pL”( ) 
may be interpreted as follows: v, may be computed by choosing a 
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starting state x according to the starting distribution and then weight- 
ing it by the conditional mean of the number of times in j given that 
the process starts at x. 

As we noted earlier, there is an asymmetry between p(x) and q’(y) 
in that one of these quantities depends on v and the other does not. 
This asymmetry is cleared up by a discussion of h-processes for P; we 
shall see that p(x) depends on h and that q’(y) does not. The special 
case we have been considering so far will be seen to be the case h = 1. 

In discussing the entrance boundary for P*, where h > 0 is super- 
regular and normalized, we choose 


Í H = filh 
in analogy with the choice of n”. Define o by 
j= vihi. 


Then o is positive, is P”-superregular, and satisfies of” = vf = 1. 
Direct calculation shows that 

o-dual (P*) = v-dual P = Ê 
and that 

o-dual (f") = v-dual f = 7. 


Therefore the entrance boundary for P” and f” is the same as that for 
P and f. 

Now let {z,} be an extended chain representing P” and o (existence 
by Theorem 10-9), and take 7” and f” as the functions relative to which 
the exit and entrance boundaries are formed. The process starts in *S 
and goes to S"* = S*. To obtain the starting and final distributions, 
we need the functions p and g. For p we have 


ee (hE)? 
h = J 
es Ee Ns 


> (BEX 


= lim lim “4 


ers jax (Nf),/h; 


E A 
EtS j>x Qj 
For q we have 


qr"(y) = lim (uF) K"(., y) 
Ets 
= lim > pFAK(i, y)/hy 
EtS iek 


= lim p*K(-, y) 


Ets 


q’(y). 
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Therefore, in the chain {z,} we have as starting and final distributions 


Pr €C] = | pedut) 
c 


and 
Prz, eC] = ie @'(y)du'(y). 


These relations show the symmetric roles of p and q and give us sym- 
metric interpretations for u” and p". 
In the special case of positive minimal v and h, we must have 


v= J (£o, -) 
and 
h= K(-, Yo), 


where x, and yo are in B° and B,, respectively. In this case 


p(o} = wM{Yo}) = 1. 


Thus the process {z;,} has almost every path starting at x) and going to 
Yo. The paths which start at x) have measure p”(x,), and the paths 
which go to b have measure qg’(y)). Therefore we must have 


p"(%o) = 9"(Yo)- 


12. Proof of Theorem 10-9 
(1) Construction of the extended stochastic process. Let 
P = Pt and P = Pes 


Redefine v as being restricted to the domain S. Let v¥ be the balayage 
potential of v on a finite set H with u? the balayage charge. For each 
E we shall consider a Markov chain with transition matrix P and 
starting distribution už, and we shall combine these into a single 
extended stochastic process by choosing a common time scale. For 
the sake of convenience we assume that S = {0, 1,...}. Our measure 
space will, as usual, be the double sequence space obtained from S, a, 
and b. To define the process {x,} it is sufficient to assign probabilities 
consistently to basic statements. 

We shall use {y,} to denote the outcome functions for the Markov 
chain P with various uë as starting distributions. These vectors are 
finite measures, but not necessarily probability measures. Since S is 
the set of non-negative integers, it has a natural ordering on it. For 
each path w in the ordinary sequence space of P, let s(w) be the smallest 
numbered state on path w and let t(w) be the time that s(w) is first 
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entered. (Then s and t are defined everywhere except on the one 
path (b, b, b, b,...).) 
There are two kinds of probability assignments to be made on the 
basic cylinder sets. First, if 1 e S, define 
Priz, = 4 A Engi SJ AtA Enim = k] 
F Pry#[Yt4n =t A Ytrne1 =} Nitt N Ytenem = kl, 
where H = {i} and where the right side is taken as 0 for the set of w 
on which t(w) + n < 0. Second, define 
Priz- n-or =G@A-+--A L n-2 504 AZ p- =a 
NEon =i NA Eny SJAA Enem = k] 
= m Priy =j AA Ym = ka t= n. 
The effect of these definitions is to fix a time scale with this property: 
For each path there is a designated state such that the first entry into 
that state occurs at time 0. Then for i e S, Pr[x„, = i] is finite for all n. 
To show that {x,} is an extended stochastic process, we must check that 


the above definitions on cylinder sets are consistent. That is, we must 
show typically that 


> Prit, =i A tray =J A Enpo = k] = Pri, = i A ayer = j), 
k 


when 7 and j do not both equal a, and that 


(*) D Priva =i AG HG A ns = k] 
i 
= Priz_, =Jj A Z-n+1 = k], 
when j and k do not both equal b. The first identity is immediate 
from the above definitions. But to prove the second identity, we need 
some alternate expressions for the left side. 

Let i eS, let E = {i}, let F be a finite set with E C F, and let q be 
the statement y:,, =i A Yiens1 =J A Yianeg = k. By the first 
definition above, 

Prix, = i A 41 SJA Enya = k] = Pr,a[q]. 
We shall show that 
Pr elg] = Prela]. 
Since the truth of q depends only on events after the time tz when E 
is reached, Theorem 4-10 gives 


Prela] = > Pry, = m] Prig] 


meE 


= 2 (u? BE)m Pral ql. 
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Now pF BF = pë, since pF and p? are balayage charges (see Section 2). 
Therefore 


Prurlg] = >, (#*)m Prnlg] = Prela]. 
meE 
It follows that 
Pr,=[q] = lim Pr,r[g], 
Fts 
which is the first of the two alternate identities we need. 

Next, we note that (J — P) = u by Theorem 5-10. Hence w; = 
lim;;s u? by the property of balayage charges that they converge to 
the product of the given superregular measure and J — P. Thus 

Pr[x n-2 =AA Lin =~GAN UH, =TA Bagi =J A age = k] 

= m Priyi =j Ay=k At=n) 
= lim pf Priy, =j A y2=kAt=n] 
F 


= lim PryFlyp =i AY, =JA Yg=kKAt= NI), 
F 


which is our second alternate identity. 


Now we are in a position to prove (*). In the proof we shall use the 
abbreviation 


Y = lim > > Pryl¥m-1 = 4A Ym =j A Ym =k 
FtS m=1 ieS-F 
At=md4+ nj. 

We begin by using our two alternate identities and by performing a 
direct calculation. Recall that i cannot equal b, since j # b. 

> Prix- n- =A Lon =j A Linea = k] 

ieS 
ori=a 


= lim > PryF[Yt:-n-1 = 4 A Yt-n =I A Yt-ngi = K] 


FtS jer 
+ lim Pryrlyo=jJAY, =kat=n] 
Fts 


= tim { Š > PryF[Ym-1 =% A Ym =) ^ Ym =K At=m+n] 
m=1 ieF 
+ Prt =j A ya =k a t= nl} 
= lim > Pry[Ym =j A Yme1 =k At=m +n] — Y 
F m=0 


= Ho PryF[Y¥t-n =J A Yt-nti =k] - Y 


Priw_, 5j A ® ng = AJ - Y. 
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Thus we are to prove Y = 0. Now 


0< Y < lim > > Pry [Ym-1 = 4 A Ym = J] 


F m=1 ieS-F 


= lim 2 (HFN) Pi; 


F ief 


Since 5, »,P;; is convergent, the right side is zero. Hence Y = 0. 
Therefore {x,„} is an extended stochastic process. 


(2) Verification of some of the properties of an extended chain. We 
must now check that {x„} satisfies the four properties of an extended 
chain. First we check (2). Consider the special case in which Æ is the 
finite set {0, 1, 2,..., i}. This set has the property that on any path 
beginning in a state of E the statement t > n implies the statement 
that there is an m > n with y, ¢ H. Therefore 


lim Pr[z_,¢ E for some l = n] 


- lim > Pr{z_,¢H AZE NA n EE] 
n l=n 


a uim > Pryt[y;-.€ B A+++ A Yt-n€ El] 
l=n 


lim Pr,[y,-,¢ E for some l > n] 
n 


lA 


lim Pr,z[t > n] 
< lim Pr,«[(4m) with m > n and y,, € £E]. 
n 
Since P is transient and p” is finite, the right side is zero. Hence so is 
the left side. By Corollary 1-17 we conclude that 
Pr[uz = —oo] = 0. 


For the general case of a finite set F, choose a finite set E of the above 
form which contains it. Then 


Pr[u; = ~œ] < Pru, = —o] = 0. 
Hence (2) holds. 
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Now suppose for the moment that we have proved (3) in the form 
that for any finite set E the process watched starting in Æ is a Markov 
chain with starting distribution u” and transition matrix P, and suppose 
we have identified the given measure v as the vector of mean times in 
the various states. Then (4) follows, since v was assumed finite-valued. 
To prove (1) that the domain of u; has positive measure, let j e Æ. 
Then 

0<y=7 = > HFN y;. 
teE 
Hence uë > 0 for some i€ E. But uë1 is the measure of the set on 
which £ is ever entered for the first time. That is, it is the measure of 
the set where up > —oo. Hence (1) holds. 

Thus we are to prove that the process watched starting in the finite 
set E is a Markov chain with starting distribution u? and transition 
matrix P, and we are to identify v. Suppose we have proved this 
assertion for all finite sets of the form E = {0, 1, 2,..., e}, and suppose 
F is an arbitrary finite set. We show first how the assertion follows 
for F. Choose a set E of the special form containing F. Then watch- 
ing the process beginning in F is the same as watching the process 
watched starting in Æ beginning in F. Thus since the H-process is a 
Markov chain, so is the F-process by Theorem 4-9. The starting 
distribution for the F-process is 

Prytly,, = j] = > BP Priye = Jj] = > bE Bi, = pf. 
ieS ieS 

Hence we may assume that Æ is a set of the form H = {0,1,..., e}. 
Let {z,} be the outcome functions for the process watched starting in Æ. 
We shall compute the probability of the typical basic statement p 
defined as zọ =i A 2, =j A Za = k, where ie E, by considering 
separately the contributions from paths of {x,} with u > —œ and 
paths with u = —oo. The notation # will mean S — E. 


(3) Contribution to Pr[p] from paths withu > —œ. The contribution 
to Pr[p] of paths with u > —oo is 


foo} foo) 
> D Prit m-n-1=4 A Big eh A A ey CE A Gy = 4 
n=- m=0 J 
A © nai =J A Bente = k] 


= > > Pry [yp CH A+ © A Ym CL A Ym =i A Ymy =j 

i A Ymy =KAtL=m + n}. 
The fact that Æ is of the special form {0, 1,..., e} means that if 
Yo(w) EB A+++ A Ym—-s(w) EB A Ym(w) = å, 
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then since i e E we must have t(w) > m. Hence we need sum on n 
only over n > 0. If mn > 0, then for any w for which yo,..., Ym-1ı are 
not in EZ, we have that t(w) = m + nif and only if t(w,,) = n. There- 
fore we may apply Theorem 4-9 to Pr[—]. In symbols the argument is 
that the contribution is 


=> > Priye A---At=m+n] 


n2z0 m20 


= > > Prlyoeh A-A Ym- EË A ym = i] 
nz0 m20 


x Pr[iy =J A Yo =k At =n}. 


Let B®" denote the matrix of probabilities of entry to E at time m. 
The above expression is 


= > D (VBP), Prilyi =j A Yo =k At =n] 


n20 m20 

= > (B®), Priyi =j A Y2 =k At=7]) 
n20 

= (B®); Pry, =j A Y2 = k] 

= (pB®), Py Py 


Thus the contribution is (B®); Pi; Py 


(4) Contribution to Prip] igrom paths withu = —œ. The contribution 
to Pr[p] of paths with u = —œ is 


© 
> lim Priz nme A-A en CH At, =7 
n=- oo M> 


A Eons SJA Ens = k] 
= > lim > Pilt amn = 8S A L n-m EË A+++ A Binag = K] 


n ™ sek 
= > lim > ae PryF[Y¥;-m-n = 8 A Yt-m- na EÑ ar: 
n m seĝ F 
A Yi-n+2 = k] 
=J im > lim S Pr,F[y,; = 8 A Yre” n. 
n m seg F 1=0 
A Yim = krat=l+m+n] 
In the expression Pr,-[—], for fixed n we may assume that m is large, 
say m > |n|. Then t > l. Now since Æ is of the special form 


{0, 1,..., e} and since i € E, state i is lower in the ordering on S than 
any of the states of #. Hence 


ylw) = 8 A Yyai(w)E BH A-A t(w) =l+min 
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for m > n if and only if 


Yo(w) € 4 A Y,-1(w) EE A ylw) = 8 
A Yiai(w) EL A+++ A tw) =l+m+n, 


where G = {0,1,..., i}. In the presence of the information 
Yow) EG A+++ A Yaw) EE A ylw) = 8, 


it is true that t(w) = l + m + nif and only if t(w,) = m + n. Theorem 
4-9 now applies. The contribution to Pr[p] therefore is 


= 2 lim > lim > Pryrlyo eG Aeee Y- Eeĝ A Yı = s] 


n m seĝ F 


x Priye A-A Ymy = KAt=m+ ni). 


Next, we operate on the second factor. In the presence of the 
information yo(w) = s A y,(w)€ Ë A+++ A Ym—s(w) EE A Y_(w) = å, 
we know by the special form of E that t(w) = m + n if and only if 
t(w,,) = n. Hence, by Theorem 4-9, we have 


Priye A+++ A Ymag =RAtC=mM+ Nn] 
= Priye Ë A-A Ym = i): Prly,y =j Ay =k t= DI. 
The second factor on the right side of this last equation does not depend 


on m and therefore factors out. We may sum it on n, and we get 
P,;P;,. Hence the contribution is 


= lim > lim > Pryl[yseG@ a---Ay1EG AY = s] 


m sek F 120 


x Prlyie A+++ A Ym = Py Pe. 
The first factor here is 


> Prytlyo € & AsttA Yc AY, =s]= F 
120 


| 
7 
+ 
M 
7 
R 
pon 
lt 


= pe + 2 (p? SP!~*P), 
= pë + (u? ONP), 
Applying the identity €N = N — B°N of Lemma 8-17, we have 
pF ONP = "(N — B&N)P 
= pPNP — Wë B°NP 
u®NP — u° NP if FDG 
=x" P — °P. 
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Since v» —> v monotonically and since u? —> u, we have 
lim > Pryt[yoE@ A tA Y-1EG A y= 8] = ps + VP), — (VP), 


F 120 % 
= Vs — (v Phe 


since (Z — P) = u. Thus the contribution is 


= lim > (v — fP), Prlyi eB Ac A Ym EL A Ym = PP. 


m sek 
But 
Pry, € £ Asst Ym-1 EE A Ym = î] = ee milan sah 
If 
T U 0 0 
r- ) then =| ) 
R R 0 R 
and 


0 0 
Epm-1p = ( } 
Q”-IR Q” 


From this representation of £ P”-1P, we see that the sum È seg may now 
be extended over all of S, and the contribution is 


= lim [(v — vê P)(EP”-1P)]}; PP ix: 
m 
Now 
ve P(EP™-1)P < pe Pmrtl_, 0 
since vf is a potential. Therefore the contribution is 


= lim (v #P™-1P), (Pu Pir). 
m 


Since 7 is a state of Æ, this expression is 
= lim (vQ" 1R) (PsP) 
From the identity v = vP + p, we have 
vg = vgU + vQ + pe 
or 
veQ = vg — (veU + me). 
Iterating, we find that 
vQ"! = vg — (YU + pa +R +++ gR?) 
and hence 
vgQ™ "1B = vR — (veU + up +Q +--+ Q"?)R. 
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By monotone convergence, 
lim vzQ" OR = vgk = (veU + pa) EN ZR. 


As we noted in Section 2, the balayage potential v? satisfies 
vë = (vp veU =N;). 
Hence 
lim vQ” 1R = (v = vE)R — Hg ENR. 
m 


If we twice use the fact that v and v? agree on Æ, we obtain 
(v — ¥8)gR = [(v — »*)P]e = [(v — u) — (0 — we = pë — pe 
Thus 
lim vgQ" "R= we — (ug + we "N GR) = (pë — wB*),. 
m 


We conclude that the contribution to Pr[p] is 

= (pë — p BE) PuPe 

(5) Completion of proof. From steps 3 and 4, we find that 
Pr[zo = i A 21 =j A zg = k] = pEP,P;, for ie LE. 
Hence 
Pr[za = k | zo =i A & =j] = Py 

if Pr[z% =i Azı =j] > 0. That is, {z,} is a Markov chain with 
matrix P and with starting vector uë, provided Æ = {0,1,..., e}. 
We have seen how the proof for a general finite set follows from this 
special case. 

We must compute the mean number of times in state j in {x„}. The 
mean number starting in Æ is (u®N),; = v?. Hence the mean total 
number is lim; v? = v; by monotone convergence. 

Finally we show that the contribution to v of the paths with u > —0o 
is uN. On these paths the process behaves as if it were started with 
distribution uB? (see step 3 if E = {0,1,..., e} and use the identity 
BEB? = BF, where E D F, in the general case). Therefore the 
contribution is 

lim (uB*)N = aN — lim p EN 
E E 


by Lemma 8-17. Since "N < N and lim#’N = 0, pFN —>0 by 
dominated convergence. Thus the contribution is 


lim (uB*)N = aN. 
E 
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13. Examples 
EXAMPLE 1: Basic example. 


For the basic example, we recall from Chapter 5 that 


B; Site . 

a ifi< 

Ba d 
N; = B 8 

i Fi ifi>j. 

Bo Bi z 


In a process of this kind in which all pairs of states communicate, there 

is no gain in choosing a complicated starting vector m. Indeed, if we 

choose 7 to be a unit mass at 0, then all finite-valued P-regular func- 

tions are certainly 7-integrable, and therefore no other choice of 7 will 

make the representation theorem yield additional regular functions. 
With ~ chosen as a unit mass at 0, K(i, j) becomes 


1 ifi<j 


i 


Since K is of a particularly simple form, it is possible to compute the 
metric for the exit boundary. Ifj < j’, then 


d(j, 9’) = 2, wNoil K(6, J) on K(i, 7’) 


- Bebe) 


j<isj’ 


=> a 


j<isj’ 


The metric space (S, d) thus is isometric to the subset of the real line 
consisting of the points 0, w,, Ww, + We, W; + We + Wg,.... Its 
Cauchy completion contains the one extra point corresponding to 
Dior Wi 

Alternatively we can find S* directly without using the metric. In 
fact, if {j,} is a sequence of points in S, then {K(:, j,)} is clearly Cauchy 
if and only if either {j,} is eventually constant or {j,} tends to infinity. 

Either way, there is exactly one boundary point, which we may call 
+œ, and the relative topology for S is discrete. Moreover, 


K(i, +œ) = lim K(i, ja) = 1. 


Since 1 is regular and since there is only one boundary point, the 
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boundary point must be extreme. Moreover, every regular function 
h > 0 must be constant. Since the process does not disappear, 
harmonic measure u must assign unit mass to +00. 

Next, we consider the entrance boundary. Iff is a unit mass at 0, 
then J is given by 


Bi/Bo if j7 = 0 
Hiri) = yË = cer ; ifj >0,j7 >% 
fi (5 ee 2) if j 
Bo Bo Bi 


The sequence 1, 2, 3,... again is Cauchy, so that there is necessarily 


exactly one limit point in *S. But 
lim J(n, i) = J(0, 4), 


no 


V 
Sa 
KR. 
IA 
5 


and consequently *S = S and the entrance boundary is empty. The 
state 0 is a limit point and thus S does not have the discrete topology. 
Since the entrance boundary is empty, there are no non-zero regular 
measures o > 0. (We had arrived at this conclusion in Chapter 5, 
too.) 


ExAMPLE 2: The p-q random walk. 


The process P is the Markov chain with state space the integers which 
moves one step to the right with probability p or one step to the left 
with probability q = 1 — p. We shall assume 0<q<p<1. A 
calculation like that in Section 5-8 shows that 


1 ifi<y 
Hy = esd has ce R 
(q/p)'~? ifi>j 
and then 
i ae | ae ifi<j 
< Qp p -7 ifizj. 


Take ~ and f to be unit masses at state 0; as in Example 1, we may 
make these choices since all pairs of states communicate. We obtain 


S Ni l for j > i and j = 0 
K(i, J) = T = : 
Nos (q/p)' for j < i and j < 0. 
There are two distinct infinite Cauchy sequences, one corresponding to 


+œ and the other corresponding to —oo. For these points, 
K(i, +0) = 1 
K(i, =œ) = (q/p}. 
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Both functions are regular and minimal, and thus the boundary 
contains two points, both extreme. No point of S is a limit point. 
Since the process goes to +00 with probability one, we have 
u({+oo}) = 1. (The concentration of u can also be deduced analytically 
from the representation of 1.) 

The entrance boundary is treated most quickly by reversing the 
process with respect to the regular measure « = 17. Then P is a 
process of the same type but with the roles of p and q interchanged. 
Consequently 

J(-0,97) = 1 
J Jj) = (play, 
and p({—co}) = ji({—o0}) = 
The function p(x), which A not depend upon «, satisfies 


E 
J 


p(—oo) = lim lim ae fie =p-q. 
EtS j>- œ jo Ets Noo 
For p(+00), we note that if j is large, then h? = Hm, where m is the 
last state in Æ. Also 


Peo, cm MU cg Ss SOD 
No (p-ap (p-a) 
As E + S, m— +% and therefore 
p(+0) = 

Any extended chain representing 17 starts at —0oo with probability 
p — q and goes to +00. 

The p-g random walk can be used to show what happens if m is 
chosen in such a way that there is a P-regular function which is not 
a-integrable. In fact, set 


(via'(2—4) fori < 0 


0 for i > 0. 


Then 71 = 1 and the regular function (q/p)' will not be integrable. 
We must recompute K(i, j). First we note that for j > 0, 


Ny; = > mNy = (p — a Sm An a) oe 
i<0O 
and that forj < 0 
Ny; = (p — 9) ž mH; = p~* 2 (p/q)'Hi; 
=p > ( (play! KEA A (piai qlp)'~? 


i<j 


m = 


= (p — gq) *(p/g) + p79 (play. 
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v 
© 


m 1 for j > i andj > 
a . ma ij = & 
Kii, j) = F = )_(p -dalp 
=  Up-gitpy 


Again there are two boundary points +00 and —oo, but this time 


for j < i and j < 0. 


K(t, +00) = 1 
and 
K(i, —o0) = 0. 


Thus —oo is not an extreme boundary point. The nonintegrability of 
(q/p)' introduced a boundary point whose associated function was 
identically zero. This example is typical of the general situation, but 
we shall not pursue the details. 

There is still a second way a zero boundary point can arise, and the 
degenerate case where p = land gq = 0 in the example above illustrates 
the point. If h is regular for this process, then 


h; = Pristina = Iyar, 


and hence h must be constant. On the other hand, it is readily seen 
that, for any choice of m for which mN > 0, there are two boundary 
points +00 and —oo and that 


K(i, +00) = 1 
and 
K(i, -œ) = 0. 


Thus again there is a zero boundary point, but this time no function is 
missing. 


EXAMPLE 3: Symmetric random walk in three dimensions. 


The symmetric random walk in three dimensions is the sums of 
independent random variables process on the integer lattice in three- 
dimensional space with transition probability 4 from any point to each 
of its six neighbors. We recall from Proposition 5-20 that the only 
bounded regular functions for this process are the constants. We now 
prove, using in part the methods of transient boundary theory, the 
deeper result that the only non-negative regular functions are the 
constants. 

In fact, if we choose 7 to be a unit mass at 0, then every regular 
function is z-integrable, and the representation theorem assures us that 
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it is enough to prove that K(-, x) = 1 for every x in the boundary. 
That is, it suffices to prove that for every i 

lim K(i,j) = 1. 

j> œ 
We shall prove in a moment the following estimate for No;: If j # 0, 
then 


No = on zI + O(|j\7 2) 


Once we have this estimate, we also have, for 7 # j, 
3 
mi il + O(|j — i| 7? 
No; 
+0 3 
zI (41-2) 
lj — ilt + Olja 
[jl + O(il 3) 
|l n- 
—— + O( 1) 
_ lj — il sl 
1 + O(|j| 7) 


K(i, j) = 


|l 
+0 1), 
Sira] (Q315) 


Hence, lim, K(i, j) = 1, as asserted. 
Thus we are to prove the estimate for Nọ; We first show that 


1/2 p12 p12 e- 2-Day dedz 
=3f aL a 1/2 3 — COS 2rx; — COS r£ — COS rtg 
where x = (£1, £2, %3). Call the right side of the equation h;. The 
singularity in the denominator of the integrand is of the order of 


1 1 
3 — (1 — 27x?) — (1 — 2Qnx,2) — (1 — 2m) 2? |a|? 


near x = 0, and |z|~? is integrable in a neighborhood of x = 0. Thus 
h, is certainly well defined, and moreover h; > 0 as j > œ by the 
Riemann—Lebesgue Lemma. We claim that 


(I — P)mh. = 8no- 
Call the six neighbors of 0 by the names a,, and call the region 


—} < Ti, £z, £3 < $ 
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by the name Æ. Then 


(I ga P)m-h. hn Si bÈ hma, 


s 


3 f (3 — cos 2rx, — cos 27%_ — cos rx)! 
E 


< (ate z D 
s 


sf (3 — cos 2rxı — cos ræ, — cos 27x3)~4 
E 


x aoe = > 
s 


f e7 2nim- Idg 
E 


= Sno: 


Now No. = N.o is also a function tending to zero (see Proposition 7-10) 
and satisfying 


(I = P)m N.o =? mo» 


and thus N.o — h. is a bounded regular function. By Proposition 
5-20 it must be constant, and that constant must be zero, since both 
functions vanish at infinity. Thus the formula for No, is established. 

We now have Nj, exhibited as the jth Fourier coefficient of a certain 
periodic function of three variables. The remainder of the proof will 
presuppose some knowledge of Fourier transforms and tempered 
distributions. 

Let p(x) be an infinitely differentiable function defined on R? such 
that 


1) 0<o <1. 
(2) p(x) = 1 for x in a small neighborhood of 0. 
(3) g(x) = 0 outside of a slightly larger neighborhood of 0. 


For simplicity, let ||a||* = 2,4 + £34 + x34. Denote by f(x) the periodic 
continuation of the function 


3 4 


The difference 


Fle) g 


3 — cos 27x, — cos 27%, — COS 27x 


is a bounded periodic function which is infinitely differentiable away 
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from 0 and is O(|z|?) near 0. Moreover, its Laplacian is integrable in 

— {0}. These facts imply that the Fourier coefficients of the 
difference are O(|j|~?) as j > œ. Hence it is enough to show that the 
Fourier coefficients of f are o to 


saa + O(|jl 7). 
It is clear from the definitions that the Fourier coefficients of f 
satisfy 


Í f(æje irde = | fxe- 277 da, 
E R? 


Here the right side is the Fourier transform of f evaluated at j. We are 
thus to estimate the Fourier transform of f. Write 


3 Kall 3 lelt — slzl* | 3 
f(z) = [p(z) — ulap 2|æ|2 + 2|a|* Tal + 27? |x|? + 2|x|4 + 10 
and take the Fourier transform of both sides in the sense of distributions. 
We shall consider the four terms on the right separately. 

The transform of the contant g 5 is the distribution defined by the 
measure which assigns weight ;3 Sto the origin. It has the property 
that it is supported, say, ently in the unit ball. 

In the third term, the numerator is a homogeneous harmonic 
polynomial, and the transform of the term is known from Fourier 


analysis to be 
15 py [ilyll* — sul), 
8m Pv| 2|y|" 
where 


(PV[g], 4) = lim | goad: 
e>0 J|xlze 


This distribution has the property that it is the sum of a function and 
a distribution which is supported in the unit ball. The function is 


15 |iyllt — žlylt 


Sn yl? for |y| = 1, 


and it is O(|y|~°) as y > œ. 
The transform of the second term is known from Fourier analysis 
to be the distribution arising from the function 


3 


2m|y| 
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We come to the first term. Since the transform of the left side is a 
function, the transform of the first term must be the sum of a function 
and a distribution supported in the unit ball. On the other hand, the 
first term is an infinitely differentiable function. If we iterate taking 
the Laplacian of it enough times, the resulting function is integrable 
and has a bounded function as Fourier transform. Since the opera- 
tion of taking the Laplacian goes into multiplication by —47?|y|? 
under the Fourier transform, we obtain the inequality of functions 


(47?|y|?)* FT (first term) < A,, k sufficiently large. 
This result implies that the function part of the distribution 
FT(first term) is O(|y|~2*) as y > œ. 


Since the sum of the transforms of the four terms is a function, we 
must have, outside the unit ball, 


3 
FT(f)(y) = O(|y|~2*) + rly] + O(|y|~%). 
We conclude therefore that 
3 
FT j) = —~ + O(|j| -8 
(NH) = gary + A 


and hence that 


as required. 


EXAMPLE 4: 


This example will be a process with an exit boundary point x such 
that K(-, x) is regular and normalized but is not minimal. 
The state space will be 


S = {a,, aj, bi | i = 0,1, 2,...} 
and the non-zero transition probabilities are 
Pe iG, = Pi Panairi =q = l- p, 
P, 1b > i 
Paa; = Pi Panid =q =l- p. 


The picture for this process is as follows. 
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ee ee — ss 
ao a4 a; 
Let 
i 
Bo = 1, B: = Pr» Bo = lim Bi 
k=1 i> o 
i 
yo =l, Yi = [I Tks 
k=1 


i 
o, = 9, = > Prge +1l Yes Cm = lim %- 
k=0 ded 
We shall assume for this example that co, = +œ. At the end of the 
discussion of the example we shall show that this condition is possible. 
Any state can be reached either from a or from aj. Thus if we set 


Tag = Tay = 3; 


then 7N is strictly positive. Since the process is in any given state at 
most once, we must have H = N. It is clear that 


0 ifj <i 
H = Hga = 
eS (BB tjai 


0 if j <i 
H = 
"lym G2 4 
and Hyg = Haa; = 9 = Ayo, = Hoa; We must compute H,,,, 
(and H,;,,). Ifj < i, then Hao, = 0. Otherwise we obtain a set of 


alternatives by considering the first time the process switches from 
the a-row to the b-row. Then 


j 
Hav, = 2 Haaie +140, »,- 


Substituting, we find 
0 ifj <i 


Ha, = Hay, = Yj 
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In determining the exit boundary we use the observation that when- 
ever finitely many Cauchy sequences exhaust S, then those Cauchy 
sequences define all of the boundary points. In the present example 
the claim is that the three sequences {a,}, {a;}, and {6,} are each Cauchy 
sequences. 

First consider K(-, a,;) for large j. We have 


K(a;, a;) = K(b,,a;) = 0 
and, fori < j, 


Hence a, = lim a, is a boundary point, and it satisfies 


2 
K(a; Go) = 5, and K(a;,a.) = K(b;,a,) = 0. 
i 
We can check directly that K(-, a») is a regular function; moreover, it 
is normalized because 


TK(-, ao) = 4K(đo, 4.) = 4-2 = 1. 


Similarly we find that {a;} is a Cauchy sequence. If we call its limit 
a’, then 
2 
Pı 


Since K(-, do) # K(-, ah), @ and a are distinct boundary points. 
We see also that K(-, a) is regular and normalized. 
Finally we consider the sequence {b,} for large j. Fori < j, we have 


Kla, b) = Zeu = Pan > (224) 


nb; Ha by Bi Oj 


K(a;, in) = and K(a, ao) T K(b,, ao) = 0. 


and 
H 
K(b, b,) = iia PE, 


H nbj H aobj YiTj 


Since, by hypothesis, ce = +00, we have 


lim K(a,, b;) = lim K(a;, b;) = = 
j j Ri 


and 
lim K(b;, 6;) = 0. 
j 
Therefore {b,} is Cauchy. Denoting its limit in S* by ba, we conclude 
K(a,, bo) = K(a;, bo) = 1/8; and K(b, bo) = 0. 


We can check that K(-, ba) is regular and normalized. 
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Thus there are exactly three boundary points, and each is associated 
with a regular normalized function. But 


K(-, bo) = 3K(-, Ao) + 3K(-, An) 


and b, therefore cannot be an extreme point. Then by the repre- 
sentation theorem all non-negative regular functions are generated by 
K(-,@,) and K(-,a_..). Hence their linear independence as functions 
on S implies that they are both minimal. That is, Be = {a,, a}. 

In this example, £., is positive if and only if harmonic measure assigns 
positive weight to the boundary. In fact, 


H({@0}) = $B. =. u({ao}). 


The remainder of the weight is put on the states from which the process 
can disappear. We obtain 


u({b;}) = Pr,[process disappears from b,] = (y; — y;+1)0;- 


Note that if fe = 0, then there exist non-negative regular functions, 
but none of them is bounded. 

We still must show that the condition oœ = +œ puts no restriction 
on whether f must be positive. To get s» = 0, take 


a ee | 
PiHG aT = Be 


Then 8; = y; = 2~/ and o; = (j + 1). Hence os = +0. To get 
Bo > 0, take 


j+ 1 j+1 


; r = > 


; , enden ge yeee oe. 
7 (j +1? 7 P: = G+ IP j +2 
Then 
eer Rat l andae 
Pi op pay Bo D VFF 2 
and 
2L j+2 1 j+2 & ı 
foe) . DE E ee ar +00 
v= = GrG 2 Pra) 
EXAMPLE 5: 


This example will be a process in which S* has every point of S as a 
limit point. 
Let S be the set of all finite strings of positive integers of the form 


(Iki kass Ba) 
as n and the k’s vary. The transition probability from (1, k,,..., kn) 


394 Transient boundary theory 


to (1, ki, ..., kn, m) is to be 27" and all other transition probabilities 
are zero. 

Since each state is entered at most once, we have N = H. The 
hitting probability from 

I= (1, jı, os «>Jn) 
to a distinct state 
J = (1, ky,..., kw, m) 

is 


Q-Fne itty +m if n <n! and jy = ky... ja = Kn 
Hy = 


0 otherwise. 
Take ~v to be a unit mass at (1). If 
0 = (1), J = (1,k,...,h,), Jm = (l, kis.. +, by, M), 


the claim is that 
lim K(I, Jm) = K(I, J) 
m> oo 
for all J. It would therefore follow that every point J in S is a limit 
point of S*. 
For the proof we observe first that K(I, J) = K(I, Jm) = 0 unless I 
is an initial segment of some Jm. Also if J = J,,,, then 


K(I, J) = lim K(I, Jm) = 0. 
Thus we may suppose that J is an initial segment of J. In this case, if 
I = (1, ky, e... k.), 


H 
K(I, Jn) = a = Whyte tk, 


m 


and 


K(L, J) = fu = teth, 


oJ 


Hence K(I, J) = lim K(I, Jm) always. 


EXAMPLE 6: Space-time coin tossing. 


Let S be the set of lattice points (n, i) in the plane with 0 < i < n. 
A point of S is to be identified with ¿i heads in n tosses of a fair coin. 
Thus we take 

Panoni = Pantiin = F 
with all other transition probabilities equal to zero. In this example, 
N = H and also 


n — m 


ano 
H m.in. T j—i 


0 otherwise. 


) ifn>m,j 2% 
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Let r be a unit mass at (0,0). Then 7N > 0 since every state can 
be reached from (0, 0). We obtain 


n-m n 
. ; | "| [() enemies 
K((m, 2), (n,9)) = j -— t j 


0 otherwise. 


As a first step in obtaining the exit boundary, we shall prove that if 
(ny, jx) is Cauchy, then lim, (j,,/7,) exists. In fact, set i = 0 and 
m = 1 and consider, for n, > 1, the expression 


ni ; . 
K((1, 0), (tp jx) = 2(™ / ("" _ 2% — je) 9 ah, 


Jk j Ny Nk 


J 
As k — œ, the left side converges; hence so does the right side, and 
lim j,,/n,, must exist. 

Conversely suppose that (ny, jx) is an infinite sequence such that 
t = lim),/n, exists. We claim that (n,,j,) is Cauchy. Fix (m,i) 
and denote by (n, j) a term of the sequence (ny, jy). Then 


K((m, i), (n, j)) 


(C) 


(n)...(n — m + 1) 


Noting that both numerator and denominator have exactly m factors 
and that m is fixed, divide the numerator and the denominator by n™ 
and pass to the limit in each factor separately. In each factor we have 
j/n — t and, for instance, (i — 1)/n—>0. We get 


lim K((m, i), (m,j)) = 2™é(1 — t)". 


Therefore the classes of infinite Cauchy sequences are in one-to-one 
correspondence with the rays to infinity, that is, in one-to-one corre- 
spondence with points ¢ in [0,1]. The functions associated to these 
sequences are 

K((m, i), t) = 2741 — t)". 


It is easy to check that all of these functions are regular and normalized, 
and hence the points in question form the entire boundary B (and 
nothing more). 

Next, we check that the topology of B is the usual topology for the 
unit interval. The map of the unit interval onto B is continuous 
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since each function K((m, i), -) is continuous in the unit interval 
topology. Since the map is one-one from a compact space onto a 
Hausdorff space, it is a homeomorphism. 

To see that every point of B is extreme, we assume, on the contrary, 
that 27#(1 — f))"-' is not minimal. By Theorem 10-41 there is a 
measure v on the Borel sets of B, C [0, 1] with 


QPL (Iie dg) = Í, 2nt'(1 — t)™~‘dv(t). 


Specializing to the case m = i and extending v to be defined on [0, 1], 
we find that 


1 
t= Í tidv(t) 
o 


for all i. But by the Weierstrass Approximation Theorem, v is 
completely determined by its integral against all polynomials tt. 
Hence v is a point mass at tọ, and to is extreme. 

We can summarize our results so far by saying that no point of S is a 
limit point, that the boundary B is homeomorphic to the unit interval 
under the correspondence 


K((m, i), t) = Wt — t)" i, 


and that every point of the boundary is extreme. 

The Strong Law of Large Numbers, when applied to coin tossing, is 
equivalent with the statement that harmonic measure u concentrates 
all its mass att = 4. We can verify this result directly: We do have 


1 
fi roa = g-as) = a — B=, 
0 

and, by the uniqueness half of Theorem 10-41, 51). can be the only 
measure yielding 1. Hence ô}; is harmonic measure; that is, harmonic 
measure concentrates all its mass at t = }. 

By Theorem 10-41 every measure u” on the unit interval gives rise 
to a non-negative regular function h for the process and u” is harmonic 
measure for the h-process. We shall discuss only two special cases. 

If u” is the unit mass at a point tọ other than 0 or 1, then 


hmp = Z — to)" -t 
and 
h pas 
Pond +1) = = to, 
h = 
Pliny. +144) = to. 


Thus the h-process is space-time coin tossing in which there is prob- 
ability tọ of heads and probability 1 — tọ of tails. The fact that 
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u"({to}) = 1 means that with probability one the ratio of heads to 
total tosses tends to tọ, again in agreement with the Strong Law of 
Large Numbers. 

If, instead, u” is Lebesgue measure, then 


1 
hema) = 2" Í t(1 — t)" idt 


0 
mi (t+ II'(m — i + 1) 
m+) 
mitm — i)! 
(m + 1) 


2 


Consequently, the transition probabilities in the h-process are again 
computable, and we find 

m—-itt+1 

Phnddm+1,0) = m}? 


and 
Pecari itl, 
(m,i)(m+1,i+1) m + 2 


The significance of this process is discussed in Problems 28 and 29. 


14. Problems 


1. If m assigns positive weight only to finitely many states, show that 
nK(-, x) = l for every boundary point x. What is the corresponding 
condition so that K(-, x) is regular for every boundary point? 


2. Consider the identity 
Í PK(., x)dp(x) = pf K(-, x)du(x) = P14. 
s* s* 


What can we conclude about p if P1 = 1? What if (P1) < 1? 
Problems 3 to 12 deal with sums of independent random variables on the 


integers with p_, = 3, pı = 3, pp = $. This example was discussed in 
Section 5-8, and the results there obtained will be useful. 


3. Find N. 

4. Let 7) = 1, and compute K(i, j). 

5. Show that the two boundary points are + œ, and find the two minimal 
functions. 


6. From the form of these two functions determine what weight u must 
put at each point. Does this agree with your understanding of the 
‘ong-range behavior of the chain? 


7. Let h; = (4). Find P” and pw”. Check the representation of h. 
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8. Show that the A of Problem 7 is minimal by proving that P” has only 
constants for bounded regular functions. 


9. Let fı = ŝo and g = Nf. Find J(i, j). 


10. Show that the entrance boundary is “the same” as the exit boundary, 
and find the two minimal measures. 


11. Find p(— œ) and p(+ œ). 


12. Construct extended chains representing the two minimal measures of 
Problem 10 (in the sense of Theorem 10-9). For each chain, verify that 


Priz, eC] = | pla)du"(e. 
Problems 13 to 20 refer to a “double basic example.” Let 
S = {0, 1, 2,...,0, 1’, 2’,...}. . 
The non-zero entries of P are 


Praia =p; Piro = l — Pp; Po-wy = Pr; Paes = l — Pr- 
Let 


i i 
Bo = fo =1, A = Tl» By = TI Pr, 
k=1 k=1 


Bo = lim f;, Bo = lim By. 
Assume that a > 0 and Bo > 0. 


13. Find H. [Hint: Use Propositions 4-14 and 4-16.] 
14. Let mọ = 1, and compute K. 


15. Show that there are two boundary points which correspond to oo and 
oo’, and find the minimal functions. 


16. From the forms of the two minimal functions, find u. Give an intuitive 
interpretation for the two weights. 


17. Find the most general non-negative normalized regular function h. 
Show that h is a convex combination of the two minimal functions. 


18. Show that any such h tends to u”/p at each boundary point. 


19. Let k = 4K(-, œ) + $K(-, —œ). Show that P” goes to œ with prob- 
ability + by computing p”(0o). 


20. For the example of Problem 19, compute P” explicitly if 


Verify the conclusion of Problem 19 by computing hë for 
E = {i i+ l,i + 2,...} 
and by letting 7 tend to oo. 


Problems 21 to 27 deal with a certain tree-process. The states are all finite 
sequences of H and T. The empty sequence is the starting state and is 


Problems 399 


denoted state 0. From state a = (a,,a,,...,a,) the process is equally 
likely to go to (a,, a2, - - -, An, H) or to (a1, dg,...,4,, T 

21. Find N. 

22. Let mo = 1, and compute K. 

23. Show that the boundary is the set of all infinite sequences of H and T. 
24. What is the topology of the boundary ? [It is not that of the unit interval.] 
25. Use the identity 


Pr,[2, € C] = Í K(a, z)du(2) 
Cc 


to find the measure p. 
26. Let h be defined by 


l ifa=0 
r 2 if a= (H) 
“" j4 ifa begins with a, = a. = H 


0 otherwise. 


Prove that h is a normalized regular function. Find P* and p”. Show 
that h is continuous and that h(x) = du"/dy for a.e. x. 


27. Let fa = l/m if a consists of exactly m H’s and m T’s, and let f, equal 0 

otherwise. Show that f is a charge, and verify that 
Pr[lim g(x,) = 0] = 1. 
Problems 28 and 29 deal with an instance of Polya’s urn scheme. An urn 
contains some white balls and some black balls. A drawing is made with 
each ball equally likely to be drawn; the ball drawn is then replaced and 
another of the same color is added to the urn. This scheme is repeated over 
and over. 


28. Let the pair (m, i) stand for m + 2 balls in the urn with i + 1 of them 
white. Show that if the outcomes of the Polya urn scheme are taken as 
such pairs (m, i), then the resulting process is a Markov chain. Note 
that the transition matrix for this chain is identical with the one for the 
h-process considered at the very end of Section 13. 


29. Let the scheme be started with 1 white ball and 1 black ball, and let r,, 
be the fraction of white balls at time m. Use the observation in Problem 
28 to compute 
lim sup Pr[|r, — 4| < €]. 
m 


Problems 30 to 34 establish the necessity of a necessary and sufficient 
condition that a transient chain P with all pairs of states communicating 
have a non-zero non-negative regular measure «. Number the states 
0,1,2,.... Let ¥L, be the probability that the process started at i reaches 
j and that the first visit is immediately preceded by a visit to a state >k. 
For instance, °L,, = H,;. The condition on P if « exists is as follows: 
There must exist an infinite set E of states such that 
k 
lim lim at = 0 forallj. 


ko {=> o 
ieE " 
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30. Prove that 
Ly — *Ly No 
Hy ~ No Pop 


for any n with PẸ > 0. 
31. Let 
IPP = Prix, =k At,#AJ if O< m< n 
Show that 


ao oo 
k pi n, 
Lis >. > PPP, 
r=k n 


=0 


and derive as a consequence that 
k-1 
k 
Ly < Ny - > NaP; 
r=0 


32. Define f by f; = Sj. Ifan« > 0 exists with «P = «, show that there is 
a point x, of *S for which J (xo, -) is a minimal measure. Choose E to 
be the set of states in some Cauchy sequence converging to xo. 


33. Prove that 


lim lim 


k> œ i> o 
ie€E 


N, "<M, 
st > at P| = 0. 
Ee r=0 Nio : 


34. Put together the preceding results to obtain a proof of the necessity of 
the stated condition. 


Problems 35 to 38 refer to a transient chain with absorbing states. As 
usual, we write the transition matrix in the form 


P= (r o) 


r-e) 


Show that P’ has only transient states. Let m be any starting distribu- 
tion for P’ with rN’ > 0. Prove that if i is an absorbing state of P, 
then the harmonic measure u(i) in P’ is equal to the probability in P 
of absorption in i. 


36. If P is the infinite drunkard’s walk with p = 3, find harmonic measure 
for P’ when 7, = 1 


35. Put 


37. Suppose P1 = 1. Find a necessary and sufficient condition on harmonic 
measure for P’ that P should have been an absorbing chain. 


38. Let Q be any chain with all states transient and with the enlarged chain 
Q absorbing. Use the condition of Problem 37 to give a new proof that 
Q has no non-zero bounded regular functions k > 0. 


CHAPTER 11 


RECURRENT BOUNDARY THEORY 


1. Entrance boundary for recurrent chains 


Boundary theory for recurrent chains proceeds along altogether 
different lines from the approach in Chapter 10. A clue to the 
difficulty is that every non-negative superregular function is constant, 
and hence the representation of such functions degenerates. More- 
over, since a recurrent chain is in every state infinitely often with 
probability one, an almost-everywhere convergence theorem is out of 
the question. 

But intuitively, at least for some recurrent chains, there is some 
limiting behavior going on. For instance, with the one-dimensional 
symmetric random walk the Central Limit Theorem implies that the 
probability that the process is on either half-axis after time n tends to 
4 as n increases. 

If P is a normal chain and Æ is a finite set, then (P"B*),, is the 
probability that the process started in i enters E after time n at state 
j, and its limit A? is the “long run” probability of entering E at j. 
As we let E swell to the whole state space S, it is not clear from this 
interpretation just what happens. Consider therefore the following 
alternate interpretation: (P"B*),, is the probability that the process 
started in ¿ at time —n enters E after time 0 at state j, and the limit 
àF is the probability that the process started at time —oo enters E 
after time 0 at state j. In this interpretation, if we let Æ swell to S and 
pass to the limit in the appropriate sense, we can expect àF to converge 
to an entrance distribution for the chain. 

Suppose we compute instead the limit of P"B® on E followed by the 
limit on n. By dominated convergence, we have lim,;,, P"B® = P”. 
Therefore, if we can justify the interchange of limits on Æ and n, we 
see that each row of P” can be expected to converge to an entrance 
distribution for the chain. This conjecture is in contrast with the 
situation for transient chains, where P” converges to an exit distribution. 
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In other words, the limiting behavior of P” has something to do with 
the past history of the process. Our procedure will therefore be as 
follows: We start with a recurrent chain P, make it disappear after it 
reaches 0, and apply transient entrance boundary theory to the result- 
ing process. The boundary obtained will be the entrance boundary 
for P, and it will be suitable in a wide class of chains for describing the 
limiting behavior of both A¥ and P”. We shall see that this procedure 
is canonical in that it does not depend upon the choice of the state 0. 

For the remainder of this chapter, let P be a recurrent chain, let 0 
be a distinguished state, and let a be a finite-valued positive regular 
measure; «æ is unique up to a constant factor. 

We define a transient chain Q associated to P and 0 by 


fe ifi 40 


Qi; 


0 ifi=0. 


For this transient chain N, = °N,; + 8). Choose f to be a column 
vector which places a unit weight at 0. Then Nf = 1. 

Form the Martin entrance boundary of Q with respect to the reference 
function f. We have 


Ta N 
J(i, j) = NA, = "Ny + do. 
The compact metric space *S is the completion of S in the metric 
described in Chapter 10, and we shall see in Proposition 11-1 that *S 
does not depend upon the choice of the state 0. The spaces *S and 
*§ — S can unambiguously be called the completed space and the 
recurrent entrance boundary, respectively, of the chain P. The set 
B° of extreme points of *S — S will also be independent of state 0, and 
we are therefore free to speak of extreme points of the recurrent 
boundary. 
Since J (x, j) is well defined for x in *S, the above expression for J 
shows that °N (x, 7) is also well defined if we put 


N, if xr=ies 
0 7) — 
Næ) = \iimon,, if ce*S — 8 and z = lim ip. 


Proposition 11-1: If *S, and *S, are the completed spaces of P formed 
with respect to two distinct states 0 and 1, then the identity map on S 
extends to a homeomorphism of *S) onto *S,. Under the homeo- 
morphism the extreme parts of the boundary correspond. 
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Proor: We first establish the identity 
a 
Ny = Ny + Wo — F Na. (*) 


In fact, the mean number of times the process started at i visits j from 
the time of the first visit to 1 until before the first visit to 0 is °H,, °N;;. 
Thus if we compute in two ways the mean number of times the process 
started at i visits j before reaching both 0 and 1, we obtain 


1N; + °H, oN; = N; + 1H io IN oj. 
Substitute 1H,. = 1 — °H, and get 
Ny = "Ny + No; — "Hal No; + N). 


Applying Lemma 9-9 to Ê under the identification i > 1, j > 0, and 
k — j, we find that the term in parentheses equals (a,/«,)°N,,. From 
this relation, (*) follows. 

Equation (*) and the expression for the kernel J show immediately 
that Cauchy sequences of S in the chain relative to state 0 are Cauchy 
relative to state 1, and by symmetry the converse is also true. Thus 
the statement about *S, and *S, follows. 

For the assertion about the extreme points, let J, and J, be the 
kernels for the two transient chains, and suppose, for instance, that 
J,(y, -) is normalized minimal and that J,(y, -) is not. In any case, 
Joly, -) is normalized since J,(y, 0) = 1. Thus 


Tolar) = | Jae, d 
by Theorem 10-48, where »(*S) = 1 and v does not concentrate all its 


mass at one point. Extending equation (*) to *S and using the 
connection between the kernels and the N’s, we have also 


Jale, j) = ole j) + J100, j) — Ž Jol 1) = 8yo: 
Integration of this equation against v gives 
(yj) = f, Jale, jid). 


Since J; (y, -) is normalized minimal and v concentrates its weight at 
more than one point, this last equation contradicts the dual of Lemma 
10-33. 
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2. Measures on the entrance boundary 


Let 59. and 5.) denote the Oth row and column, respectively, of the 
identity matrix. In this section we consider finite-valued row vectors 
v for the Markov chain P with the following three properties: 


(1) v > 0. 
(2) vo = 0. 
(3) vP < v + ôo.. 


We shall associate to each such row vector a probability measure f” on 
the entrance boundary and show how P’ is related to v probabilistically. 

Direct calculation shows that the row vector whose jth entry is 
v, = v; + 59; is non-negative and Q-superregular. We introduce a 
process which, when 7 > 0, is the p-dual of Q. Let © = {i |v; > 0} 
and define 


Q} = Pn for i, j eS” U {0}. 


v, + Ôo 
All other entries of Q” are taken to be 0. Note that 
a aNg 
Noy = s P = ñ = v + 8. 
0 


Proposition 11-2: If v satisfies (1), (2), and (3), then there is a unique 
probability measure £” on the Borel sets of S U B° such that 


ja Í ON (a, -AB(2). 
SUB® 
ProoF: By Theorem 10-48 
p= f Jæ, Daga) 
SU B® 


for a unique measure 8”. Since řọ = 1, Ÿ is normalized and ”(S U B°) 
= 1. Hence 


v= ip 7 N(x, -\dp”(x). 


If another such probability measure is given, we can reverse the 
argument and conclude the measure is 8” by the uniqueness half of 
Theorem 10-48. 


We define 67 to be the probability in the Q’-process started at 0 
that there is a last time the process is in # and that this occurrence is 
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at state j. Note that 6% depends on P, 0, v, and E and that 6F = 0 
for j not in E. 


Proposition 11-3: If Æ is any subset of S containing 0 and if v satisfies 
(1), (2), and (3), then 
vg(I — P?) = 0E — ôo.. 


Proor: Using as a set of alternatives the time when the Q’-process is 
last in Æ, we have for, j € E, 


© 


OF = > (QS? Prj[process leaves E immediately and never returns] 
n=0 
= Nafi = È (Qe + 22 Ve Nagu) | 
keE c,dek 


Now N}; = v; + ĉoj, and therefore 
Noe = Ye Pres 
and 
Nije *NcaQan = VePpa "NacPei- 
for all states, not just those in S” U {0}. Substitution and use of 
Lemma 6-6 give 
oF = 0; + Ôo; = > vy, PE. 


keE 


In general, 6% has total measure less than one, since the Q” process 
either may fail to reach # or may return to it infinitely often. If Æ is 
finite and 0 e E, neither of these alternatives has positive probability 
and thus #41 = 1. This conclusion also follows from Proposition 11-3, 
since finite matrices associate. 

The special case E = S yields the following corollary. Let 8” stand 
for the restriction to S of the measure £” defined in Proposition 11-2. 


Corollary 11-4: If v satisfies (1), (2), and (3), then 
v(I — P) = Ẹ — ôo.. 


Proor: Let E = S in Proposition 11-3. The only way the process 
can leave S is to disappear, and thus 0f is the probability that the 
Q’-process disappears from state j. But this is just the definition of 
B’(j), since BY was defined as harmonic measure for the Q’-process 
started at 0. 


We conclude this section by noting the connection between 6% and 
B”. The proof is contained in the proof of Proposition 10-21. 
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Proposition 11-5: If v satisfies (1), (2), and (3) and if {#,} is an 
increasing sequence of finite sets of states with union S, then the 
measures 6%« converge to 8” weak-star on *S. 


3. Harmonic measure for normal chains 


We come now to the first convergence theorem. We shall prove 
that if P is a normal chain, then the measures ÀF converge weak-star 
to a measure 8 which will play the role of an entrance distribution for 
P. This result agrees with the statement in Section 1. 


Lemma 11-6: If P is normal, then the row vector v = Gp. satisfies 
conditions (1), (2), and (3). Also, for any finite set E containing 0, 


va(I — P®) = XE — &.. 


ProoFr: We know that Go; = 0 and Goo = 0. Hence (1) and (2) 
hold. Condition (3) follows by multiplying the definition of Go. 
through on the right by P and applying Fatou’s Theorem. 

Form the K-matrix of Definition 9-80 with respect to the distin- 
guished state 0. By Lemma 9-81, Ko; = Go;. Hence the formula 
for vg(I — P”) is the Oth row of the formula of Corollary 9-86. 


Harmonic measure f for a normal chain P is defined to be the 
measure 8 = $” of Proposition 11-2 for v = Go.. The justification for 
a name independent of 0 is contained in the following theorem. 


Theorem 11-7: If {£,,} is an increasing sequence of finite sets with 
union S, then the measures A*« converge weak-star to B on *S. The 
measure f is independent of the distinguished state 0. Also 


Gy = f ‘N (a, j)dB (a) 
SUB® 


and. 


GU — P) = 16 - I. 


ProoF: Ultimately the sets Æ, contain 0, and Proposition 11-3 and 
Lemma 11-6 apply. From these results we obtain A®« = 6", and from 
Proposition 11-5 we conclude that A*« converges to 8B. Thus £ is given 
a characterization independent of the state 0 (since A¥ does not depend 
on 0). Since 8 does not depend on 0, we can use any state 7 as dis- 
tinguished state in Proposition 11-2 and Corollary 11-4. The two 
formulas of the theorem then follow. 
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The first convergence theorem, which was proved in Section 3, 
resulted from examining a particular row vector v satisfying conditions 
(1), (2), and (3) of Section 2, namely v = Go.. We now begin to work 
toward the second convergence theorem, and we do so by examining 
the vector v = °N(x, -). For the present we do not assume that P is 
normal. 

We start by checking that v = °N(z, -) does satisfy the three condi- 
tions and by identifying 0% for v. In fact, v > 0 and vy = 0 because 
°N(t, -) = 0 and °N(i, 0) = 0. Thus (1) and (2) hold. Moreover we 
have, for every i # 0, 

OM (i, -)P < °N(i, -) + So. 


Thus we can let 1 — x, using Fatou’s Theorem, and obtain (3). | 
The procedure for calculating 07 is to do so for the approximating 
vectors °N(i, -) first. 


Lemma 11-8: If Æ is a finite set containing 0 and j, then 


> Mil — PF) = Bi — ôo; 


keE 


Proor: Apply Lemma 9-37 and conclusion (1) of Theorem 9-15. 


REMARK: The lemma remains valid for infinite sets Æ containing 0 
and j, and the proof consists in computing the dual of the left side by 
means of Propositions 6-16 and 6-17 and Lemma 9-14. 


We shall say that a column vector h is continuous if it has a (neces- 
sarily unique) extension to a continuous function A(x) defined on all 
of *8. 

In this notation the right side of the identity in Lemma 11-8 is 
continuous for fixed j € E, and hence B5 is continuous if j e E. But 
BE is identically zero for 7¢ # and any state can be taken as the 
reference state 0. Thus we are justified in writing B¥*(x,j) for the 
continuous extension of the jth column of E whenever £ is a non- 
empty finite set. By an elementary continuous function is meant a 
finite linear combination of such functions. 

Passing to the limit i > x in Lemma 11-8, we have 


> (a, kL — PE) = B(x, j) — 80; 


keE 


whenever 0e H, je E, and Æ is finite. Consequently 0% = B¥(z, -) 
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by Proposition 11-3, provided £ is a finite set containing 0. (Note 
both sides are zero for states j not in Æ.) 

Let us denote the measure 8” obtained from Proposition 11-2 for 
v = "N(x, -) by p”. If {E} is an increasing sequence of finite sets 
with union S, then Proposition 11-5 asserts that B*«(x, -) converges 
weak-star to 87”. We have thus proved the following proposition. 


Proposition 11-9: If % is a continuous column vector and if {E,} is an 
increasing sequence of finite sets with union S, then 


lim B¥k(x, -)h = Í h(y)dB*(y) 
k *s 


pointwise for x in *8. 


If v = °M(x,-), then the associated Q-superregular measure is 
p = J(x, -). Since $” is the measure for J(x, -), B% concentrates its 
mass at x if zis in S or in B°. Thus the right side of the identity of 
Proposition 11-9 equals A(x) for all such x. 

Define a linear transformation T of continuous functions to bounded 
functions on *S by 


Tha) = | Madeta) 


A continuous function h such that Th = h is said to be T-continuous. 
(The motivation for this name appears as Problem 2 at the end of the 
chapter.) 

We conclude this section by characterizing the T-continuous func- 
tions. Notice that if every boundary point is extreme, then every 
continuous function is T-continuous. 


Lemma 11-10: If h is an elementary continuous function, then 
Th = h. 


Proor: It suffices to consider the function BE where Æ is a finite set. 
If E C E, then B® B® = BE by Proposition 5-8. Passing to the 
limit as i — x with k fixed, we have 


B(x, :)BE = B*(x, j). 
Hence the left side of the identity in Proposition 11-9 is B¥(x, j). But 
the right side is TB¥ (a, j). 


Lemma 11-11: If 4 is T-continuous and {#,} is an increasing sequence 
of finite sets with union S, then for any e > 0 some convex combination 
of the functions B*«(x, -)h is within e of h uniformly for x in *S. 
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Proor: By Proposition 11-9, the functions B*«(x, -)h converge 
pointwise to h(x). Thus by dominated convergence their integrals 
against any Borel measure on *S converge to the integral of h. That 
is, the functions converge to h weakly. Thus A is in the weak closure 
of the set {B*«(x, -)h}, is certainly in the weak closure of the convex hull 
of the set, and must therefore be in the strong closure of the convex 
hull of the set. (See Dunford and Schwartz [1958], p. 422, Corollary 
14.) 


Proposition 11-12: The set of T-continuous functions is exactly the 
uniform closure of the set of elementary continuous functions. 


Proor: Every T-continuous function is contained in the uniform 
closure according to Lemma 11-11, since B% (x, j) vanishes unless j is 
in H,. Conversely, every elementary continuous function is T- 
continuous by Lemma 11-10, and the uniform limit of 7'-continuous 
functions is 7'-continuous, since T has norm no greater than one. 


5. Normal chains and convergence to the boundary 


As an application of the machinery of Section 4, we can prove the 
second convergence theorem—that each row of P” converges weak-star 
to the harmonic measure £ in a suitable class of normal chains. This 
result was suggested in the discussion in Section 1, and it was pointed 
out that the key to the proof should be a certain interchange of limits. 
In fact, this interchange has already taken place and is concealed in 
the proof of Lemma 11-11. 

We begin with a particularly sharp form of the convergence theorem. 


Theorem 11-13: If P is normal, if i is any state in S, and if h is 
T-continuous, then 


lim (P"A), = Í , Pdga). 


Conversely, if this equation holds for all states ¢ and all T-continuous 
h, then P is normal. 


ProoF: Let « > 0 be given and let {E,} be an increasing sequence of 
finite sets with union S. Since h is continuous, we can choose ko large 
enough so that 


[rogo — Mh) < € 


for all k > ky by Theorem 11-7. Truncate the sequence of sets so that 
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it contains only those sets {£,,} with k > kọ. Since h is T-continuous, 
we can apply Lemma 11-11 to the truncated sequence to obtain a 
convex combination of the functions B*«(x, -)h which is uniformly 
within e of h. If, say, 


I> cBEh — h| < €, 
then also 
|PR(È c BEh — h)| < € 


since P™1 = 1 and P” > 0. Consequently, 
| f h(x)dB(x) — (Ph), Í h(x)dB(x) — > c;à®h 
+ |> c,(ABs — P} B=)h| 
+ |P2(> c;BEh — h)| 
< 2e + |X 0(A% — P} B= )h]. 


< 


The sum on the right is a finite sum and in each summand only finitely 
many entries of A7: — P?B% can be non-zero. Since P} BE: > A‘: 
pointwise, we conclude that the sum on the right side is less than e for 
n sufficiently large. 

The converse follows by applying the assumption to columns of BE 
for two-point sets Æ. 


The convergence theorem is as follows. 


Theorem 11-14: If P is normal and if B = B®, then each row of P” 
converges weak-star to the harmonic measure $. 


Proor: Apply Theorem 11-13. If B = B°, then every continuous 
function is 7'-continuous. 


Thus for normal chains with B = B°, the measure f indicates what 
the chain is “near” in the long run. In the case of null chains with an 
additional property, we can show that the chain is near the boundary 
in the long run. 


Proposition 11-15: If P is a normal null chain with B = B® in which 
every one-point set in S is open, then (S) = 0. That is, the measure is 
entirely on the boundary. 


Proor: If i is given, then the characteristic function of i is con- 
tinuous. By Theorem 11-14, lim PW = A(t). But for a null chain, 
P” tends to zero. 
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Under the hypotheses of Theorem 11-15, the number A(X) for any 
Borel subset X of B may be interpreted as the probability that the 
process is near this part of the boundary in the long run. For example, 
if x is at a positive distance from other boundary points, then any 
sufficiently small neighborhood E of x will have a continuous charac- 
teristic function. By Theorem 11-14, 


lim Pr{z,¢ E] = > PẸ > gla). 


jeE 


Let us see what our results say for a noncyclic ergodic chain P. 
Such a chain is necessarily normal, and « may be chosen to have total 
measure one. If this choice is made, then G(I — P) = 1a — I by 
Corollary 9-51. Comparison with Theorem 11-7 shows that a = ŝ. 
Thus the harmonic measure is concentrated entirely on S, in contrast 
with Proposition 11-15. The measure f is a generalization to all 
normal chains of the measure « for noncyclic ergodic chains. Thus 
our results generalize to all normal chains results known for ergodic 
chains. For example, the representation 


Gy = f, Næ, DAB) 


is a generalization of the identity Gu = >,«,'N,;, which holds for 
noncyclic ergodic chains. (Theorem 9-26 gives 


Gy = 'v; = lim > PR 'N pjs 


and Proposition 1-57 yields G,; = > a, ‘'N,,;.) As a second example, 
(Ph), converges to ah for any bounded function h if P is noncyclic 
ergodic (Lemma 9-52). This result is generalized in Theorem 11-13, 
but in this theorem we had to make a stronger assumption about h. 
The difference arises because in a noncyclic ergodic chain each row of 
P” actually converges to « in the norm topology of measures, not just 
in the weak-star topology (see Theorem 6-38). 


6. Representation Theorem 


Beginning in this section, we connect the results we have obtained 
so far with the results of Chapter 9 on potential theory. We start by 
proving a representation theorem. For the moment we do not assume 
that P is normal. 

If p and v are row vectors with u = v(I — P), then p will be called 
the deviation (from regularity) of v. If u1 is finite, then we say that v 
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is of totally finite deviation. Dually f = (J — P)h is the deviation of 
h, and h is of totally finite deviation if «af is finite. 


Theorem 11-16: If v is a non-negative row vector whose deviation u 
is totally finite, then u1 < 0 and there is a probability measure m on 
B° such that 


v = volasao) + (W °N)} = (ut) | Ne, jidat). 
If u1 # 0, then the probability measure 7 is uniquely determined. 


Proor: We know that 
Ny = °Ny + ôo; 
If we put ~ = v(I — Q), we have ñ, = m + voPo by direct calculation. 
We therefore get 
(EN); = (u °N),; + (41)805 + vo ° No; 
= (4°); + (u1)80; + vola;/ao). 


From Proposition 8-7 applied to the chain Q, we see that v = AN + p, 
where p is regular and non-negative. The calculation of AN yields 


— (p1) ifj=0 


p=» -Em = { ea 
pa 7 [oy = volala) — (uN), ifj # 0. 


Since pp => 0, u1 < 0. If u1, = 0, then p is a Q-regular measure > 0 
with pọ = 0. Since it is possible to get from any state to 0 eventually 
in the Q-process, p vanishes identically. The representation then 
follows immediately from the expression for p. 

Thus suppose p1 # 0. Define o by 


o; = — (u1) t; — vo(a;/o%o) — (u °N);]. 


We claim that o satisfies conditions (1), (2), and (3). It is clear that 
co = 0, and the fact that o > 0 follows from what we have shown 
above. For (3), we have 


(oP); = —(u1)~*(pQ); = —(u1)~*p; = a; + 8o;, 


and thus equality holds. Thus, except for the assertion that 7 is 
concentrated on B°, the rest of the theorem is immediate from Prop- 
osition 11-2. So simply note from the proof of that proposition that 
if we have equality in (3), then ~ is concentrated on Be. 
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We define the exit boundary of P to be the entrance boundary of P. 
That is, S* = *S and B, = Be. We obtain a dual for Theorem 11-16. 


Theorem 11-17: If h is a non-negative column vector whose deviation 
f is totally finite, then «f < 0 and there is a probability measure 7 on 
B, such that 
hy = ho + NA -Z | Me, ddala). 


Œi JB. 


If «f # 0, then the probability measure ~ is uniquely determined. 


We turn to results connected with potential theory. We first apply 
Theorem 11-17 to give elementary continuous functions a characteriza- 
tion which is valid for all recurrrent chains, normal or not. 


Proposition 11-18: If % is an elementary continuous function, then 


(1) (Z — P)h = f has finite support. 

(2) h = c1 + °Nf for some c. 

(3) af = 0. 

(4) B¥h = h for any set E containing the support of f. 


Conversely, if (1) and (2) hold, then A is an elementary continuous 
function. 


Proor: (J — P)B® is equal to I — PF for states in Æ and is 0 
otherwise. Hence a[(I — P)B*] = 0 and (1) and (3) hold for columns 
of B®. Since an elementary continuous function is a finite linear 
combination of such functions for finite sets E, (1) and (3) follow. In 
particular, any column of PF is of totally finite deviation and Theorem 
11-17 applies. The representation in that theorem establishes (2) for 
columns of B®, and the general result follows by linearity. 

We shall complete the proof by showing that (2) implies (4) and that 
(1) and (4) imply that h is an elementary continuous function. Suppose 
(2) holds. Let 0¢ #. Then °N = B¥°N + EN by Lemma 9-14. If 
E contains also the support of f, then “Nf = 0, and we see directly that 
BEh = h. So (4) holds. If (1) and (4) hold, then (4) holds for some 
finite set E, and h is exhibited as a finite linear combination of the 
columns of BE. 


We now assume that P is normal and we shall prove statements for 
T-continuous functions that look like applications of Proposition 11-18 
followed by a passage to the limit. 
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Proposition 11-19: If P is normal, then a function h of finite deviation 
is T-continuous if and only if h = c1 + g for a T-continuous potential 
g. Ifthe representation holds, then 


c= I h(x)dB(z2). 
+s 
Proor: If (I — P)h = f and if h is T-continuous, then 
(I + Peet P./f = (I — P™*)h,—>h — c1 


by Theorem 11-13. If his of finite deviation, then f is a charge and the 
limit h — c1 is a potential. This potential is T-continuous, since 1 is. 
(Notice that 1 is the Oth column of B®.) The converse is clear. 


Corollary 11-20: If P is normal and if g is a T'-continuous potential, 
then 


|, aaptey = 0. 


Proor: Potentials are of finite deviation by definition. Applying 
Proposition 11-19 and using the fact that 1 is not a potential (since its 
charge would have to be zero), we see that c = 0. 


Corollary 11-20 is a recurrent analog of the statement for transient 
boundary theory that potentials tend to zero along almost all paths of 
the process. 


Corollary 11-21: If P is normal and if h is a T-continuous function 
whose deviation f is totally finite, then af = 0 and h = c1 + "Nf. 


Proor: By Proposition 11-19, h differs from the potential g of f by a 
constant. Butg = g1 + °Nf by Theorem 9-15. 


Our final proposition enables us to give an interpretation to 6” for 
certain infinite sets provided we have an interpretation for finite sets. 


Proposition 11-22: If Z, is an increasing sequence of finite sets with 
union Æ and if 071 = 1, then lim 0% = 6* pointwise. 


Proor: Since 07x is an exit probability, it is decreasing in k from some 
point on. Hence it tends to a limit, say 0;. Since 0x1 = 1, we have 
61 < 1 by Fatou’s Theorem. For large k, we have also 67 < 6% and 
hence 67 < 6; Thusl = 671 < 61 < 1. This statement is consistent 
with the inequality 6* < 6 only if 0F = 9. 
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As an application of Proposition 11-22, suppose P is normal and we 
choose v; = Go; Then 6% = àF for finite sets containing 0, and 0” is 
thus a limit of \¥-measures for any set with 61 = 1, whether or not ÀF 
exists for the infinite set Æ. The condition that 071 = 1 means that 
the transient chain Q” leaves E with probability one; this condition is 
exactly the statement that E be an equilibrium set for Q”. For such a 
set E and for any reference state 1 e E, we have 


[Gl = P®”)];. = OF > 5. 


by Proposition 11-3. Since, for two different reference states i in E, 
we have 0E as the limit of the same sequence A*« (by Proposition 
11-22), we may write G,(J — PF) = 102 — I. Then 


(OEGI — PF) = 0 


and hence 6£G, = ka for some constant k. If we form the K-matrix 
relative to state 0, then Lemma 9-81 gives 


Gy = Ky + (Gio — Cio)(as/a). 
Consequently, #£K, = k’a, provided 
> Gio <œ and > OC < œ. 
ieE 


ieE 
The conclusion that 62K, = k'a is exactly the statement that k’ be the 
generalized capacity of E and 6” be the (generalized) recurrent equili- 
brium charge for Æ in the sense of Definitions 9-88 and 9-113. How- 
ever, in the part of Chapter 9 where these points were discussed, we 
restricted ourselves to finite sets. By means of boundary theory we see 
we can extend these results to certain infinite sets. 

Finally, let us consider the special case where the space *S for P has 
only one limit point oo. Then a neighborhood of œ is simply the 
complement of a finite set not containing œ. Hence the probability of 
being within this neighborhood tends to one if P is null. Therefore P"h 
tends to h(co) for any such null chain and any continuous function h. 

If P is null and if P and P both have only a single limit point, then 
P must be normal. For such a chain, 


Gy = *N(co, j) and Cy = (œa) IN(, i), 
and the representation theorem takes on the simpler form 
a 
hy = ho + (Nf); — Son. 
From Proposition 9-45 we know that 
Ny — (e/a )Cyo = Cos — Cy. 
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Hence 
h=hol + (Co. — C,)f. 


Thus if Cf is finite-valued, then A differs from Cf by a constant. 


7. Sums of independent random variables 


The results of this chapter take on an especially neat-looking form 
where P is a recurrent sums of independent random variables process 
with state space the lattice in N-dimensional space. Such a chain is 
always null, and Spitzer [1962] showed it is always normal, has only 
minimal boundary points, and has no points of S as limit points. In 
two dimensions there is always a unique boundary point, whereas in one 
dimension there are one or two, depending on whether the distribution 
has infinite or finite variance. 

In the case of a single boundary point, a continuous function h is 
one having a limit at infinity, and we have already noted that the 
convergence of P”h is trivial. In one dimension with finite variance, 
the two distinct boundary points correspond to —œ and +00, and h, to 
be continuous, must have limits in both directions. For such a 
function Theorem 11-14 states that P”h converges to the average of 
these two limits. This result also follows from the Central Limit 
Theorem. 

There are such chains in one dimension for which P”h fails to con- 
verge for as nice a function as the characteristic function of the positive 
integers. However, this behavior can occur only if the variance is 
infinite; then there is only one boundary point, and h is not continuous. 

If h = 0 is a function whose deviation f is totally finite and if Cf is 
finite-valued, then the representation theorem takes the form 


h = —Cf + const. 
for the case of one boundary point and 
h = —(Cf), + a+b 


for the two-boundary-point one-dimensional case. We have already 
seen that the former identity holds for any chain with a single limit 
point. The latter follows from the representation theorem together 
with special knowledge of the nature of C obtained by Spitzer for sums 
of independent random variables. 


8. Examples 


EXAMPLE 1: Sums of independent random variables, p_, = %, 
P2 = 3- 
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This example shows the best possible boundary behavior for a null 
chain and illustrates the points discussed in Section 7. To deter- 
mine the recurrent entrance boundary we can work either with the 
kernel J(i, j) associated to the chain Q or with the matrix °N;; 
associated to P, since 


T(t, j) = Ny + 850 
We use the latter. 

To compute °N, we can proceed in the familiar way—first finding 
°H and then computing °N from the identities °Ħ = P °H, °Nj; = 
1/1 — °F), and °N,, = °H,;°N;,. As usual, the calculation of °H 
involves a difference equation valid for certain intervals of i’s and j’s. 
In this case, the nny is 


= $ °H ig + 3 Asay 
The general solution as a function of 1 is 
°H, = A + Bi + C(-2)! 


and is valid always for a slightly larger interval of i's. For instance, 
if 0 < j and if we are considering i’s satisfying 0 < i < j, then the 
difference equation is valid for 0 < i < j and the solution applies for 
0<i<j+41. The initial conditions in this instance are °Ho; = 0, 
°H, = 1 and °H,,,,, = 1, and they determine A, B, and C. Similar 
remarks apply for the other intervals of ïs and j’s, and the result is: 


for 0 <j 
Al — ( on ee isoO 
Ny = 46 — H-2- (-2))] Ost sj 
eer j <i, 

for 7 < 0 
4(—2)[(-2)-7 - I] -9 isj 
Ny = all — (-2)'] - 3 jsis0 
0 0<1. 


Therefore there are two boundary points, +00 and —oo, no point of S 
is a limit point, and 


+21 — 2)-7 >0 

ON( 400, j) = i 30 = (-2)77] 7 ee 
$[1 — (—2)74j jz0 

BAERE P jei 
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From the Central Limit Theorem we know that for large n the 
process is very likely to be far from 0 and equally likely to be on the 
right or on the left. Hence B(+00) = B(—0o) = $. Since £ assigns 
positive mass to each boundary point, both boundary points are 
extreme and consequently every continuous function is 7'-continuous. 
From Theorem 11-7 we have 


Go; = $ °N(+00, j) + $°N(—c, j), 
and for sums of independent random variables we have also 
Ci; = Gi; >= Goji 
Therefore 
Hj- il + 1 — (-2)'4] fei 
Cy = Gy = p 
łlj — il jsi. 
Let us see what the representation theorem says for this process. 
We first need to know °Ñ (x, i). We can take « = 17, and we see that 


Pi, = P aand’ Ñ, = °N,,. For Ê we again have two boundary points, 
and 


1 — (—2)! 1 < 0 

O40, i) = k (—2)'] ts 
|| 0 <i, 
j 1-—(-2)}] «<0 
aT a ( Marai 
0 0<2. 


Now let h > 0 be a function whose deviation f is totally finite, and 
suppose Cf is finite-valued. The representation theorem gives 


hy = ho + Nf), — (afir +00) °N (+00, i) + (~) °M(—oo, i)]. 
If we use the identity 
oN; a (a@,/a9)Cio a Co; m Ci, 
we obtain 
— (Cf), + (af )[a(—o0) — $l + (ho + (Cf)o) 
This last equation is an example of the formula 
h, = —(Cf), + at +b 
discussed in Section 7. 


EXAMPLE 2: Basic example, null case. 
For the basic example we have 


oxy = feo if j = a > 0 


0 otherwise. 
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Hence, for fixed j, °N,,; = 0 if i is sufficiently large. Consequently 
there is a single limit point p in *S8, and °N(p, i) = 0 for alli. Since 
°N(p, i) = °N(0, i), we have p = 0; the limit point is the state 0. 
Thus the boundary is empty. 

We know from Section 9-6 that in a null basic example à = 1 if E 
is a finite set containing zero. Therefore the harmonic measure f, which 
is the weak-star limit of such measures, assigns unit mass to state 0. 
Thus by Theorem 11-7, Go; = °N(0, 0) = 0, in agreement with the 
result obtained in Chapter 9. This example shows that the condition 
on one-point sets in Proposition 11-15 cannot be omitted. 

We can also check directly, using the results of Section 9-6, that 
G; = ‘N(0,j) and GU — P) = 18). — I, again in agreement with 
Theorem 11-7. 

The reverse P of the basic example has 


f ifi>j>0 


0 otherwise. 


Ñ; = 


Again there is one limit point p’, but this time we have °N(p’,j) = 
1 — 8o;. The measure J(p’, -) associated with the transient chain 


6 P, ifi #0 
“lo ifi=0 


satisfies J(p’,j) = °N(p’,j) + 8) = 1 and is easily seen to be Q- 
regular, and it follows that p’ is a boundary point andisextreme. Put 
+00 = p’. 

It is clear that AZ = 1 if m is the largest element of the finite set E 
and if P is null. Hence (+œ) = 1, in agreement with Proposition 
11-15. Therefore G,; = ‘N(+00, j), and we find that GI — Ê) = -I. 


ExAMPLE 3: Three-line example. 

This example is designed to show that P and P can have different 
limiting behavior and to show that in a normal null chain some addi- 
tional assumption is needed (such as B = B, in Theorem 11-14) to 
ensure that each row of P” converges weak-star to a limiting measure. 

Let S consist of three copies of the non-negative integers with typical 
elements denoted by i, 7’, and i”, respectively. The process P moves 
deterministically to the left on the first and third lines and moves from 
0 or 0” to 0’ also with probability one. On the middle line it moves 
one step to the right or moves to one of the other lines, as shown in the 
accompanying figure. The quantities on the arrows are transition 
probabilities, and p and q are positive numbers with sum one. The 
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pı and q; are chosen as in the basic example, except that q, = 0, B, is 
defined as the product of the p,’s up through p, as in the basic example, 
and we require that 8, > 0 and > B; = +00. 


i—i 1 i 


o 

| 1 D4 
t Pi 

o (=r G 

[i R 1 

(0) 


C E ol" 
(j~1) / 


Clearly Hy. = 1 — lim £, = 1, and thus P is recurrent. If « is 
defined by 
a, = ppi ay = Ri, a, = qR, 
then « is a regular measure. Since «1 = 2 > B, = +œ, the chain is 
null. 
To see that P is normal, we compute A and A. Since P is null, 


the process after a long time is likely to be far to the right of any state 
we consider. Thus 


p ifaeLl 
“A= sq XifaeL" 
0 ifacelL’ anda ¥ 0’. 
The reverse process Ê moves a step to the right on L (or L”) or switches 
to L'. On L’ it moves deterministically to the left until it reaches 0’, 


and then it goes to 0 with probability p and 0” with probability q. 
Thus 


04 0 ifaeLl orac L” 
a l1 ifaeL’' anda 0’. 
By Theorem 9-26, P is normal. A 
We now determine the boundaries of P and P. If we choose 0’ as 
the distinguished state, we see that if a and b are any two states 
different from 0’, not necessarily on the same line, then 
1 if a,b e L or a,b e L”, a to the right of b 
p if ae L’, be L, a to the right of b 
Na HAG if ae L’, b e L”, a to the right of b 
ale, if ae L’, a to the left of b 
0 otherwise. 
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Thus for each b, ”N a tends to a limit along each line, but, as functions 
of b, the limits are different for each line. The result is three boundary 
points œ, 00’, and œ”, and we have 


an l1 ifbeL 
‘ (00, b) = 
0 otherwise 
p ifbeL 


”N(œ',b)=<4q if be L’ 
0 otherwise 


l1 ifbeL" 


0 otherwise. 


”N(o0", b) = { 


The measures J(x, -) defined for the associated transient chain satisfy 
J (a, b) = “N(x, b) + òo» 
and direct calculation shows that 
J(œ', :) = pI (0, +) + gJ(o", -). 


We conclude that oo and œ” are extreme boundary points, whereas 
oo’ is not. 

If £ is a large finite set of states containing 0 and 0”, then the chain 
P is most likely to be to the right of E. Hence 2 = p and AE, = q, 
where k and m” are the last elements of E on the first and third lines, 
respectively. From the form of àF, we deduce that 


B(co) =p and B(0") = q. 


We may interpret this result as follows: L, L’, and L” are neighborhoods 
of œ, 00’, and œ”, respectively. In the long run, the chain is typically 
far to the right in one of these sets. If it isin L or L”, it must remain 
there for a long time; but if it is in L’, it can leave in one step by switch- 
ing to Lor L”. This behavior is what makes oo’ nonminimal. In other 
words, far out in L is near oo, far out in L” is near œ”, but far out in L’ 
means near œ with probability p and near œ” with probability q. 
Now let us consider the boundary of Ê. We are to look at the limit- 
ing behavior of °N,, = a °N,,/a, along sequences of a’s. But for 
fixed b, this quantity tends to the same limit along all three lines. 
We thus have just one limit point ®, and the corresponding measure is 


1 ifbeL’ 


0 otherwise. 


Jld, b) = "N(, b) + Sro = { 
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This measure is regular for Q, and consequently ® is an extreme 
boundary point. By Proposition 11-15, B(®) = 1. For P the chain 
either is far out in L’ (and hence has to stay there for a long time) or 
else is in a set from which it can move in one step at any moment to a 
position far out in L’. The three boundary points for P collapse into 
one for Ê because a position far out on L or L” is only one step away 
from being far out on L’. 

We conclude by sketching a proof that the O'th row of P” does not 
converge weak-star as n tends to infinity if the pps are chosen 
appropriately. 

Thus some condition such as B = B, is needed in Theorem 11-14, 
and some condition such as 7'-continuity is needed in Theorem 11-13. 
Let h be the characteristic function of L’. Then h is continuous, since 
h(co) = h(oo”) = 0 and h(co’) = 1; but (Th)(œ') = O and thus A is 
not J-continuous. Let 


an => (Ph) o- =: Pro[x, E L']. 


We shall show that {a,} does not converge if the p;s are chosen suit- 
ably, and hence P%.. does not converge weak-star. In fact, we shall 
indicate that {a,} can fail to be even Abel summable. We define 


A(t) => af", Pt) = > Pee 
Bi) = > pt", Ft) = > Fe. 


If we let k be the last time before n that the process is in 0’, we see 
that 


and hence 

A(t) = P(t)B(t). 
For any chain, we have 

1 

PO = TFG 

For this chain, _ 
FYG*? = Balant = Ps a Beis 
whereas F9, = 0 if m is not of the form 2n + 2. Thus 
F(t) = 1 — (1 — #)B(t?). 

Combining our results, we have 


= b(t) 
A) = CBR 
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The Abel limit of the sequence a, is 
_1,. Bi) 
A OAS p BIE) 


tt 


Now the sequence {8,} which defines B(t) is an arbitrary decreasing 
sequence of positive numbers subject only to the conditions that 
lim 8, = 0, > B, = +œ, and By) = 8B; = 1. Such a sequence can be 
found so that the expression B(t)/B(t?) oscillates as t + 1. 


9. Problems 


1. For sums of independent random variables on the integers with p_, = 
2 and p, = 4, show that if h is continuous and is of finite deviation, then 
af = 0. 

2. Prove that if h and Th are both continuous, then A is T-continuous. 

3. Show that for any normal chain (I — P)C = ba — I. Identify the 
vector b. 

4. What identities hold for (I — P)K and K(I — P) in a normal chain? 

5. Let P be normal and let h be continuous. The balayage potential of h 
on a small ergodic set Æ is B¥h — (AFh)1 (see Proposition 9-43). Let 
{E} be an increasing sequence of finite sets with union S. Prove that 
the balayage potentials of h on E, converge to h — ( f hdß)1 on 8. 

6. Show that if P is a normal null chain with B = B, and if x is a point of 
the boundary with a neighborhood in S* containing no other limit points, 
then lim, Pr[z, € E] exists for all sufficiently small neighborhoods Æ and 
is the same for all such sets Æ. 

7. Prove that in a normal chain the elementary continuous functions are 
exactly the functions that can be written as the sum of a constant and a 
potential of finite support. [Hint: Use Theorems 9-15 and 11-18.] 

8. Prove that if P is a normal null chain and if P., has only finitely many 
non-zero entries, then 8; = 0. [Hint: Consider columns of J — P as 
charges. ] 

9. Show that every 7'-continuous potential in a normal chain is the uniform 
limit of potentials of finite support. 


Problems 10 to 15 refer to the null chain of Chapter 9, Problems 11 to 22. 
We shall use the notation and results there developed. 
10. Show that the recurrent entrance boundary is empty and that £, = 1. 


11. Show that Ê is the unit interval when parametrized by t = lim (x/n) and 
that 


S(t, z) = K — eoa — t-t, 


12. What is the form of the most general non-negative function regular for P 
except at 0? 
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13. In Problem 12, let hy = 0 and use Lebesgue measure on Ê. Verify that 
h is regular except at 0. 


14. Show that Ê is Lebesgue measure on Ê. 
15. Verify the interpretations of £ and f as limits of rows of P” and Ê” by 
showing this: For either chain n is most probably large in the long run. 


For Ê, the ratio z/n cannot change much in a few steps (if n is large), 
but for P a transition to 0 is always possible. 


CHAPTER 12 


INTRODUCTION TO RANDOM FIELDS 


Davip GRIFFEATH 


1. Markov fields 


One means of generalizing denumerable stochastic processes {z,} 
with time parameter set N = {0, 1,...}is to consider random fields {x,}, 
where ¢ takes on values in an arbitrary countable parameter set T. 
Roughly, a random field with denumerable state space S is described 
by a probability measure u on the space Q = ST of all configurations of 
values from S on the generalized time set T. In this chapter we discuss 
certain extensions of Markov chains, called Markov fields, which have 
been important objects of study in the recent development of proba- 
bility theory. Only some of the highlights of this rich theory will be 
covered; we concentrate especially on the case T = Z = the in- 
tegers, where the connections with classical Markov chain theory are 
deepest. 

Proceeding to the formal definitions, assume as usual that the state 
space S is a countable set of integers including 0, but let the time 
parameter set T be any countable set. The configuration space 2 = ST 
is the space of all functions w from T to S. An element w = {wte T} 
of 2 is called a configuration, and is to be thought of as an assignment 
of values from S to the sites t of T. The outcome function x, from Q 
to S takes the configuration w to its value w; at sitet. Let Z be the 
minimal complete c-algebra with respect to which all the outcome 
functions x, te T, are measurable. In this context, we introduce the 
following definition. 


Definition 12-1: A random field is given by (Q, Z, u, {x,}), where p is 
a probability measure on (Q, Z) such that 


Prix, = i,;t€ A] > 0 
425 
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for all finite (non-empty) A = T and arbitrary ieS. (As always, 
Pr[p] = p({eo|p}).) 

As in the case of stochastic processes, we often identify a random 
field with its outcome functions {2} (the remaining structure being 
understood). At other times it will be more convenient to think of p 
as the random field. Our positivity assumption on cylinder proba- 
bilities ensures that all conditional probability statements are well- 
defined. The role of transition probabilities at this new level of 
generality is played by characteristics, which we define next. 


Definition 12-2: Given a random field {z,}, and finite (non-empty) 
sets A and A such that A c A, the (A, A)-characteristic is the real- 
valued function on 2 given by 


pat) = Pr[a, = i, for all te A | x, = i for all te A — A] 


when evaluated at the configuration ı = {i,;t¢ T}. For ac 7, we 
abbreviate u4 = uf; the collection {u4;a¢ A c T} is called the local 
characteristics of the random field. 

Throughout this chapter A and A will always be finite subsets of T, 
even when not explicitly identified as such. 

Our immediate objective is to formulate the notion of a Markov field. 
As motivation, we return briefly to the setting of Chapter 4. 


Definition 12-3: A denumerable stochastic process {x,} satisfies the 
two-sided Markov property if 


Prix, = t, | £y = ip; ke {m,m + 1,..., M} — {n} 
fet eee if0=m=n< M 
Priz, = ty | En- = În-1 A Zati = Inga] 


if0<m<n< M 


whenever Pr[x, = ikm < k < M] > 0. 


Proposition 12-4: Any Markov chain {x,} satisfies the two-sided 
Markov property. 


Proor: Let 7 be the starting distribution, P the transition matrix for 
{xn} When Pr[x, = ipm < k < M] > 0, we consider the quantity 
Prix, = i, | £y = ik; ke{m, m + 1,..., M} — {n}. IfO0=man < 
M this is simply Pr[z) = to | x, = tı] by reversibility of the Markov 
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property (Proposition 6-44). Otherwise the above conditional proba- 
bility may be evaluated as 


M-1 
(7P™)i, [I Piciri 
k=m 


n-2 M-1 
(7P™),,, (ti Pyn P2 P | ĮI Pins) 
=m 


k=n+1 
Tan (2) 
TE P,, - alee + JPR -1ine1 


= (aP hiatal te PR 


in-1in+1 
= Pren = t, | Tay = Ina A Saar = nyil 
The two-sided Markov property, unlike the ordinary Markov property, 
generalizes to any parameter set T' which has a neighbor system, i.e., a 
collection ô = {ða; a e T} of finite subsets of T such that (i) a ¢ ôa, 
and (ii) a € 0b if and only if b € ĉa, a,b e T. The sites b € ĉa are called 
the neighbors of a. We write @ = {a} U ða. Also, for A c V let 


ðA = {be T —A:beda forsomeae A}; A = 4A U ôA. 


Definition 12-5: Let T have neighbor system 0. The random field 
{x,; t e T} is a Markov field (with respect to 0) if 


pê = u% whenevera c A c T, A finite. 


We shall usually assume an underlying neighbor system for T, and 
simply refer to the Markov field {x,}. Note that any Markov chain 
with strictly positive cylinder probabilities is a random field, where 
T = N. The natural neighbor structure on N is 00 = 1, and for 
n> 1, ðn = {n —1,n + 1}. In this case the Markov random field 
condition is precisely the two-sided Markov property. Proposition 
12-4 shows that any Markov chain with positive cylinders may be con- 
sidered as a Markov field on Q = O. Later we will see that the classes 
of Markov processes with positive cylinders and Markov fields on S~ 
actually coincide. 

A random field is called finite when T is a finite set. Such fields have 
an elementary theory, which will be developed in the next two sections. 
First, though, we note that the Markov field property simplifies some- 
what when T is finite. 


Proposition 12-6: Let {x,} be a finite random field. Then the follow- 
ing three conditions are equivalent: 


(1) {x,} is a Markov field. 
(2) uw? = p forallaeT. 
(3) w2(c) = pi(e’) whenever ae T and i, = i; for all t ea. 
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Proor: When T is finite, (2) is simply the Markov field property 
with A = T, while the fact that 42 depends only on t € @ together with 
(2) implies w2(c) = p31) = w3(c’) = w2(c’) whenever ı and i’ agree on @. 
To see that (3) implies (2), fix ae T and let x = {k;;te T — a} be any 
prescription of values from S on T — @. Denote by p, q, and r the 
statements 


Za = ia 4% = i forall te ôa, x, = k, for allie T —G, 
respectively. Then (3) asserts that 
Prip |q Ar] =c for all x, 
or equivalently, 
Prip A q A re] = c Pr[g a rel- 


Summing over all possible x, we obtain Pr[p|q] = c. Thus Pr[p|q A ra] 
= Pr[p|q], which is precisely (2) when k; = iforallte T —a. Finally, 
to show that (2) implies (1) we choose @ c Ac T and: = fiġłe Q. 
Since pZ depends only on sites in g, (2) yields 


Priv, =i foralltec4 A z= k, forallie T— A] 
= Priz,= i forallte A — {a} A x; = k, for allte T— A) p2(.), 


where « = {k,;t¢ T — A} is any prescription of values from S on 
T — A. Summing over all possible x, we conclude that u4(ı) = p3(c). 


2. Finite Gibbs fields 


In this section we introduce an extremely useful representation for 
the measure u of an arbitrary finite random field. The inspiration 
behind this approach (and hence most of its terminology) is derived 
from statistical mechanics, where random fields may be considered as 
equilibrium distributions for a variety of physical systems. 


Definition 12-7: A potential U on a finite set T is a family 
{U,(); A = T} of functions from Q to the real line R with the property 
that Ual) = U,(c’) whenever i, = i; for all te A, and such that 
U, =0. The energy Hy of the potential U is given by 


Hy = > Us 
AcT 


U is said to be normalized if U ,(1) = 0 whenever i, = 0 for some a € A. 
When T has a neighbor system ð, a set C €c T is called a clique if 
b € da whenever a, b eC, a # b, i.e. if every two distinct sites in C are 
neighbors. Let @ be the class of all cliques in 7. U is called a 
neighbor potential if U} = 0 whenever A ¢ @. 
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Definition 12-8: A finite random field {x,} is a Gibbs field with potential 
U if 
uY) = z-te#v® LEQ, 


where z = icq exp{Hy(:)} is often called the partition function. Im- 
plicit is the assumption z 4 +00, which imposes a condition on 
U. If U isa neighbor potential, then {x,} is called a neighbor Gibbs 
field, and Hy = dce¢ Uc. 

We remark that the potential and energy of random field theory 
should not be confused with those of Markov chain theory presented in 
earlier chapters. These terms have common origins in classical physics. 


Example 12-9: T is sometimes called a cubic lattice if no clique in the 
neighbor system for 7 has more than two elements. The most im- 
portant examples are subsets of the d-dimensional integer lattices Zł, 
where da = {b e T: |a — b| = 1}. When T isa finite cubic lattice and U 
is a neighbor potential, the energy function becomes 


Hy = > Uia + X Ua,» Where M = {{a, b}: b € Ga}. 
aeT {a,D}EV 


In this case U is called a neighbor pair potential. 


Two lemmas prepare the way for the representation theorem for 
finite random fields. Given ı = {i} and A c T, the modification 
4 = {if} of ı has values 


ve 


f forte A 

0 otherwise. 

We abbreviate .4+¢ = 44 when a¢A and 147% = .4- when 
acA. 


Lemma 12-10: If {x,} is a finite random field, ı = {i,}€ Q, A c T 
and a ¢ A, then 


palt) ED, 
palti +e) pt) 
Proor: p2(i4) = Pr[x, = if for allt e T]/Pr[x, = if for allte T — {a}]. 


When we replace :4 by .4*2 the denominator is unchanged, since if = 
ifta forte T — {a}. 


Lemma 12-11 (Möbius inversion formula): Let A be finite, and let ® 
and ¥ be real-valued set functions defined on all subsets of A. Then 


(A) = >. (—1)!4-81P(B) forall Ac A 


BcA 
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if and only if 
W(A) = > ®(B) for all A c A. 
B 


cA 


(Here |A| denotes the cardinality of the set A.) 


Proor: Assume that the first condition holds. Then 
> HB) = $ J (-Y PHD) 
BcA BcADcB 


> wD) > (-y*| w= B-Dd) 
DcA EcA-D 
P(A), 


since the bracketed sum above is 1 if D = A and 0 otherwise. The 
opposite implication is verified by an analogous computation. 


Theorem 12-12: Let {x,} be a finite random field with local character- 
istics {p4}. Then {x,} is a Gibbs field with canonical potential V defined 
by V, = 0, and for A # Ø, 


Val) = > (—1)'4-8l In p({?}), 


BcA 


> (= 4-7 In pa) 


BcA 


for any fixed a e A. Moreover, V is the unique normalized potential 
for {x}. 

Proor: Let 0 denote the configuration with a 0 at every site of T. 
Fixe Q. For A c T, define Y(A) = In[u({c4})/u({0})] and (4) = 


Vil), where V is the potential given by the first sum in the 
Theorem. When A # @ we have >3-,4(—1)!4~3! = 0, and hence 


| > (- 1)!4-81 ln w(t) —In KOI > (— yaa] 
BcA BCA | 
> (—1)!'4-21¥(B). 

BcA 


When A =Ø, Og) = Volt) = 0 = Inful{?})/u({0})] = Pø), since 
= Oforany:. Applying Lemma 12-11 with A = T we conclude that 


in HED) pel _ Ð) = H, 
oO) o ee 


D(A) = Valo) 


and hence 
ML) = {0} = 271e, 
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Thus {z,} is a Gibbs field with potential V. For any ace A c T we 
can write 


Vat) = : pa (= 1)!4> F'n o({e7}) — In we? +]. 


This shows that V is normalized, since 2? = .2** whenever i, = 0. By 
Lemma 12-10 the right hand side of the last equation may be rewritten as 


2 (04n ule) — In wg) 
= J (ye in wk), 


which establishes the second expression for V in the statement of the 
Theorem. Finally, suppose that U is any normalized potential for {x,}. 
Then H,(0) = 0 and Up(:?) = 0 unless D c B. Therefore 


eP) _ = 
In MON Hy(®) = 2 Ul?) = bA Up(:^) 


whenever Bc Ac T. If we apply Lemma 12-11 with A = A, 
(D) = Up(4) and Y(B) = ln[u({®})/u({0})], the conclusion is 


Tal) = S (ern ED — vo, 


since the last sum is 0 when A = @, and otherwise 


In p({0}) Seca (—1)!4-8! = 0. 


Corollary 12-13: A finite random field is completely determined by its 
local characteristics {u4}. 


Proor: The second equation in Theorem 12-12 shows that the 
canonical potential V for u is determined by the local characteristics, 
and V determines p. 


Proposition 12-14: Let {x,} be a finite Gibbs field with potential U. 
Then the canonical potential V for {x,} is related to U by 


VAQ= È (-)4-210,(4), A 4 o. 


BcAcDcT 


Proor: Since {z,} is Gibbs, 


we? PALES) U) — Bra 
WER) 7 Peg Tole?) ~ TE] 
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for any ae T and Bc T — {a}. Using the first equation for V in 
Theorem 12-12, it now follows that whenever ae A < T, 
B 

V i= —1 |A-Bl ]n p({e }) 

a) =D (enn era 
> > (—1)4-310,,(.2 92) 
DcTBcA 

[oao Z (yana) 


Bc DNA 


DET B, EDAA 


The inner sum in the last expression is 0 unless DONA=Q, ie., 
unless A c D. 


Corollary 12-15: Given two potentials U’ and U”, let 4, = U4, — U%,. 
U’ and U” determine the same finite Gibbs field if and only if 


(—1)!4-414,(4) = 0 for every Á # Ø. 


BcAcDcT 


Proor: Letting V’ and V” be the canonical potentials corresponding 
to U’ and U” respectively, Proposition 12-14 shows that the given 
equation is equivalent to Vi, = Vi 


3. Equivalence of finite Markov and neighbor Gibbs fields 


We now prove an important equivalence theorem which states that 
the finite Markov fields are precisely those for which the canonical V 
is a neighbor potential. 


Theorem 12-16: Let {x,} be a finite random field with canonical 
potential V. Then {x,} is a Markov field if and only if V is a neighbor 
potential. 


Proor: Fix ae T and u'e Q such that i, = i whenever fea. 
Let % and st’ be the modifications of ı and .’ respectively obtained by 
replacing the value at site a with seS. If V is a neighbor potential, 
then 


Ay (*) = > ‘ Viale) + 2 E Viale) + >» Vae) 


AcT-{a aceA¢a 
= Da t dels) + 0, 
where >, is independent of s. Let >j and >5(s) be the corresponding 
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sums when % is replaced by %’. Since “ıs and st’ agree on @, we have 
>2(s) = >3(s) for all s. Thus 


eat 22(Sq) e22 So) eit 2280) 

T(s, = gg Say’ 
Bal) = = 3455 = = ; Hale’) 
>, 2, + 22(s) > 22 (s) > 37+ 5305) 

e^! 2 e52 e 
seS seS seS 


for any soe S. Taking s) = i, = i, we have verified (3) of Proposition 
12-6; this shows that {x,} is Markov. Conversely, if {x,} is Markov, we 
claim that the canonical potential V is a neighbor potential. To see 
this, choose a,be A €c T such that b¢a. Expand V as 


pale) yale **) | 
T aaa fr amon | j 


vV) = > (-1)14-21 In|- i 


BcA-{a,b} 


Since b¢4&, Proposition 12-6 shows that p2(?*") = uł() and 
pa(ebtor+?) = ug( +9), yielding the desired result. 


Corollary 12-17: There is a one-to-one correspondence between the 
local characteristics {u7; a € T} for finite Markov fields and normalized 
(canonical) potentials V for finite neighbor Gibbs fields, given by 


Va) = > (-1)4-F ln pi?) acAeg 
BcA 


and 


where z is the appropriate normalizing constant. 
Proor: If {x,} is Markov, then V44) = 0 for A ¢@, by the last 
theorem. The rest of the first equation above was proved in Theorem 


12-12, and the second equation was derived in Theorem 12-16. 


Another consequence of Theorem 12-16 is an alternative formulation 
for finite Markov fields. 


Corollary 12-18: A finite random field {x,} is Markov if and only if 


4l) = p4) whenever Ac Ac T. 


Proor: Let {x,} be Markov, with canonical neighbor potential V. 
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A computation similar to the one in the proof of Theorem 12-16 shows 
that 


p4) = 277 exp Yat} = Ale) 


BcA:BNA# 2S 


whenever i, = i;for allt¢ A. The claim now follows from the straight- 
forward generalization of Proposition 12-6 obtained by replacing {a} 
with A. 


Example 12-19: The Markov process case. Let {x,;0 < n < N} be 
a denumerable Markov process viewed from time 0 to time N. Suppose 
that {x,} has starting distribution 7 and one-step transition matrix P, 
at time n, where 

Pais = Prien =j | Ly = îi]. 

If 7 and all the P, are strictly positive, then {x,} is a Markov field with 
neighbor system 0, where 00 = {1}, əN = {N — 1}, and ôn = 
{n — 1,n + l} for 1 <n < N. The local characteristics for {x,} are 
given by 


pole) = nlio) Po |S m(i)Pou, 


Er l,in- PEREI 1 


> fer OT? sae ee 


ies 
ule) = Pity. 
The canonical potential for the process is then given by 
Viny(t) = 1n pale)/un (0), 
T, A)T 
Halt )ea (O) 
Valo) = In aD A ={n—- 1, n}, 


Val) = 0 otherwise, 


Balt) = l<n<N, 


0 < n < N, and {x,} is a neighbor Gibbs field with normalized potential 
V. Conversely, suppose that {x,;0 < n < N} is a Markov field with 
the above neighbor system ð. A routine calculation using the explicit 
representation for u in terms of V shows that 


Pr[tp41 = J | £n = toys -£o = io] = Pritass = J | £n = ta] 


whenever 0 < n < N, so {x,} is a Markov process. On a finite time 
parameter set we see that the one and two sided Markov properties are 
equivalent, so that Markov fields are precisely the Markov processes 
with positive cylinder probabilities. 
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4. Markov fields and neighbor Gibbs fields: the infinite case 


For the remainder of the chapter, T will be a countably infinite set 
with some neighbor system ô. A neighbor potential U on T is defined 
just as in the previous sections. But we can no longer define the 
probability measure u for a neighbor Gibbs field with potential U 
explicitly; instead we must make use of the local characteristics. 


Definition 12-20: An infinite random field {x,} is a neighbor Gibbs field 
with neighbor potential U if 


pet) = z7! exp > Ua} for all finite A > g, 
aeBca 

where z is the appropriate normalizing constant. Ifse¢S and % is the 

modification obtained by replacing 7, with s, then 


z= > exp > Ua}. 


seS aeBcu 


Theorem 12-21: Let {x,} be an infinite random field. Then {x} is a 
neighbor Gibbs field if and only if it is a Markov field. 


Proor: Suppose {x,} is neighbor Gibbs. From the definition we see 
that w4(.) = w4(.’) whenever A > @ and.’ agrees with. onā. Just as 
in Proposition 12-6, this implies that u4(.) = uē(¿) for all finite 4 > a, 
so {x,} is Markov. Conversely, if {x,} is Markov we define a potential V 
by V, = 0 and 


Valet) = > (-1)'4-8l ln pA?) ae A. 
BcA 

The argument of Theorem 12-16 shows that V is a neighbor potential, 
and the only normalized potential determined by the {u4}. An applica- 
tion of the Möbius inversion formula shows that V satisfies the con- 
ditions in Definition 12-20, so {x,} is a neighbor Gibbs field. We remark 
here that Corollary 12-18 also holds for infinite fields; the proof is 
routine. 


Let Y = Y,(T) be the class of neighbor Gibbs fields on the infinite 
set T with canonical potential V. The bijection of Corollary 12-17, 
which carries over to the finite subsets of T, shows that we may con- 
sider equivalently the class of Markov fields with local characteristics 
{u4} corresponding to V. Since we will be considering many fields 
simultaneously, the elements of Y, will be thought of as measures u 
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governing {x}. In the finite case we have seen that Y, always contains 
a single field. When T is infinite this need not be so; indeed, there are 
exactly three possibilities: 


(i) =ø (ii) |F%| = 1 (iii) |F| = œ. 


This follows from the fact that Y, is convex, i.e., if p1, p2 E€ Gy and 
0 <a < l, then op, + (1 — «juo € fy. Examples of (i)-(iii) will be 
presented in Section 5. 


Definition 12-22: When |Y,| = œ we say that there is phase multi- 
plicity (or phase transition) for V. A measure pu € [y is extreme if 
whenever u = op, + (1 — æ)uz fi, 2€ Gy, O< a < 1, then p, = 
La =p. The class of extreme elements of Y, is denoted by 6y. 

Since Y, is convex, in the case of phase multiplicity one would hope 
for an integral representation in terms of y. We will obtain such a 
representation, along with a number of other results on the structure of 
G,, by connecting neighbor Gibbs fields with Martin boundaries for 
certain Markov chains. The remainder of this section is devoted to the 
study of general structural properties of Y, with the aid of the boundary 
theory developed in Chapter 10. 

To begin, we fix a neighbor potential V, assume Y, # Ø, and choose 
a reference measure ve Y,. Also we fix an increasing sequence 
{A(n),n = 0,1,...} of finite subsets of T, such that Aln) < A(n,+ 1) 
and A(n)+ Tasn—>oo. Write K(0) = A(0), K(n) = A(n) — A(n —1) 
for n > 1. Then any configuration ıc 2 may be thought of as a 
sequence of subconfigurations (x°, «1,...), where x” = {k}; t e K(n)) 
satisfies k? =i, when te K(n). For brevity’s sake we denote 
{w |x,(w) = k? for allt e K(n)} simply as[x"]. Similarly, [«°, kt, ..., «"] 
means {w | x(w) = kj for all te K(r), 0 < r < n}, and so forth. Also, 
we write v(A|B) = (A A B)/v(B) when convenient (and, of course 
v(B) > 0). With these notations in effect, observe the following key 
property of neighbor Gibbs fields. 


Proposition 12-23: If 1 = (x°, «1,...) € Q, then 


(GR tay e] LO ke) = POF ee ag e] TR) 


l<m<n<o. 


Proor: If {v4} are the characteristics of v, then pa hand side 
divided by the right hand side is equal to v4(_4)(c)/v4_1)(.) = 1, since 
the numerator and denominator of this last quotient F equal 
vm R() by the Markov field property. 

The above result reveals a Markovian structure for v which can be 
exhibited explicitly in terms of a Markov chain. The states for the 
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chain are all possible pairs (n, x"), n > 0, k” e SE™, The transition 
matrix is 


Poyetynsiet tt) = v([e**7] | Ee). 
Proposition 12-23 and a simple induction show that the n-step transition 
matrix is given by 

PR m+ nent ny = v([K™ id | [x”]). 


If the initial distribution is 7,0, = v([x°]), and {y,} denotes the result- 
ing chain, then we obtain the simple relationship 


Pri[Yn = (n, x)] = v([x")). 


Next, we connect P-regular functions with fields in Y. A lemma will 
be useful for this purpose. 


Lemma 12-24: If we Jy, 0 = (K?, k!,...) E Q, then 


pllk’, kt, ..., KJ) _ MLK”, <... Kk") 
v([K?, Kt, ..., K") v(i", ..., "]) 


l<m<n<o. 


Proor: The left hand expression may be rewritten as 


pile", «+> D) matin —ay(4) 


v([K™,..., K”]) vA (0) ale i 
and the two characteristics agree, being identical functionals of the 
potential V. 


As in Section 10-6, we call a non-negative P-regular function h 
normalized if 


nh = > T10,x°yMo,x°) = 1, 
KE SK(O) 


and minimal normalized if it cannot be written as a non-trivial con- 
vex combination of two distinct non-negative regular functions. 


Theorem 12-25: There is a one-to-one correspondence between non- 
negative normalized P-regular functions h and neighbor Gibbs fields 
we Y, given by 

hon, = pelihi") 
and 


u([K?, . .., K”]) = v([K?, 2.2, ea x” 


n > 0, "e8. Moreover, h is minimal normalized if and only if p 
is in &y. 
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Proor: Given pu € Y,, define h according to the first equation in the 
theorem. By Lemma 12-24, 


vlk", K”t+t1 ) ([k”, K”+1])) 
2 Ponynt tin? tmit = A a 
pllk”, x" *4]) 
ari vie) 
_ ell) 


= = h n , 
ae e? 


so h is P-regular. Clearly, h is also non-negative and normalized. 
Conversely, let h be any non-negative normalized P-regular function. 
We claim that the second equation of the theorem prescribes cylinder 
probabilities for a unique measure u € GY, determined by h. Note first 
that h must be strictly positive. To see this, write 


D vithan = Pth = 1, 


Krti 


and 


n+l 
y 


v(i", k”+1]) 
> aaae 


hen say 
(ey) *" 


henyx”) a (Ph) nx") = 


The first equation above implies that hy, 41,.°+1) > 0 for some x"*?, and 
hence the second shows that hy") > 0 for all (n, x"). Thus all cylinder 
measures for u are evidently positive. Next, we use Proposition 12-23 
to compute 


v([e®,..., «tt 
v([K?, ..3 K”]) a Saar Nuon a,x" 4) 


v([K®, sey K"))(PA) enn”) 
= v([K?, . .., K°])hn, er) = alle, ..., «")). 


This shows that the measures p([x°,..., «"]) on S4™ are consistent for 
n > 0, and also by induction that 


> > ulik, ..., K” +1]) = fa p([K®,...,K"]) = 1 


Kose RE vey K’ 


È Mlle, et) 


k” t 


for all n > 0, since Xpo p([k°]) = mh = 1. By the usual extension 
theorems we obtain a unique u with the desired cylinder probabilities. 


To see that u € 9y, fix finite A, A, with A c A, and choose n large 
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enough that A = A(n). Lets = {ipte T} = (n°, xt,...)E Q. Intro- 
duce the sets 

[La] = {w | w = i for all t e A}, 
[La-a] = {w | o = i for all te A — A}, 

[4] = {w | o = y for all te A(n — 1) — A} where pe GS4e-V-4, 
Then by construction, 


mla) = > > (Lea) A A [erent 
Yy k” 


and 


M(La—al) = >, >, (lea) A H] A Iha,» 
w K” 


where y is summed over S4®-)~4, K” over SF™, If %*" denotes the 
modification of ı obtained by replacing its values on A(n — 1) — A with 
p, and its values on K(n) with x”, then 4™(*") = v4(.) for all 4 and x", 
since v E€ Yy. Thus 


ella) = > > vå la-a] A 161 0 Ek] ha, 
yW e” 


v4(u([ea—a]), 


and so p4(.) = våt) = v4(u). Hence ue GY. To check the one-to-one 
correspondence, let # denote the normalized non-negative P-regular 
functions. We have defined mappings p: J > # and o: Æ > Fy. 
One easily verifies that o(p(u)) = wand p(o(h)) = h, as desired. Finally, 
p is obviously a cone homomorphism in the sense that 


plap, + (1 — ano) = apluı) + (1 — @)p(H2), 
0 <æ x< l, p1, 2E y. This implies the last assertion in the theorem, 
and the proof is complete, 


Using the Martin boundary theory for the chain P with starting 
distribution 7, as constructed from the reference measure v € y, we 
will now derive several structure properties of Gy. 

Definition 12-26: When p € GY, and n > 1, set 
pe, -p Tt) CAs ey e = (K, os EQ. 


For each fixed x” on K(n), p®™(-) defines a finite random field on 
S4n-D, The fields u*" are said to have thermodynamic limit u», 
t-lim,.. #”” = Ho in notation, if there is a measure p on Q such that 


Hn u=” (e, 2, K™]) = po ([K°,..-, K™]) 
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for all m > 0 and all configurations («°,..., «”) on S4™. 


({o | t-lim pero e6) si 
n—=> 0 


where x"(w) is the configuration of w restricted to K(n). In particular, 
Ey # Ø whenever Y, # Ø. 


Theorem 12-27: 


Proor: Let B, comprise the extreme points of the Martin boundary 
for P started with 7z, and let A denote harmonic measure. Theorem 
10-41 applied to the constant regular function h = 1 shows that 
Pr ly, € Be] = (B.) = 1. Since {y,} visits each state (m, k”) at most 
once, the Martin kernel is given by 


K((m, k”), (n, «")) = Pewen a(r?) Pro 6° y¢n, 4") n zm, 
Ko 
(=0 otherwise). Thus y,(w) € B, means that 


(feet, %(e)]) 
K K” = | 
ey e EA OD 


exists for every (m, k"), and is a minimal regular function of (m, k”). 
By the last theorem, K(-, x) is minimal regular if and only if 


v([k?,..., K™))K((m, K"), £) = pallk?, ..., K”]) for a (unique) po E 6y. 
In this case, we deduce from Proposition 12-23 that 
Wollt, wey «™)) = au v(Le°, ey en] | [x"(w)]) 
v([K°,..., K™]) v([x°,..., «™]) 
lim vk"), 2. 2, K™]) 
=. n= 
7 vf. 2g te ]) 


for all m, x". This shows that t-lim,., v°"® € & on {w: y,(w) E B,}. 
Using the reference measure v € y, we have produced a set of measures 
in éy which has v-measure 1. 


Theorem 12-28: The elements of &, are in one-to-one correspondence 
with those of B,. If u” € é, corresponds to xe B,, then there is a 
bijection between the probability measures on B, and the neighbor 
Gibbs fields in Y,, given by the equations 


p= i ptdr*(x) 
xEBe 
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and 
(E) = ufo | t-lim p*" = u® for some x eE}), 


E Borelin B,. 


Proor: We know from Chapter 10 that B, is in one-to-one corre- 
spondence with the class #, of minimal normalized regular functions, 
while Theorem 12-25 gives a bijection between #, and é. Hence 
there is a one-to-one correspondence x> u” between B, and éy. For 
pe Y,, apply Theorem 10-41 to p(u) € #, and use Lemma 12-24, to get 


plie, e") _ | peels + Dg oa, 
“EB, 


v([k,...,K™]) v([n®,..., «™) 


where A” is harmonic measure for the h-process. A routine computation 
using the explicit form of the Martin kernel, derived in the proof of the 
previous theorem, shows that 


XOF) = (fo | t-lim v®") = u® for some ze z}) 


ufo | ¢-lim p*" = u” for some ze }) ; 

n> oo 
(The p(u) process changes the reference measure to p, and p” = y" 
since both random fields are defined in terms of the same characteris- 
tics.) Setting à = °™, the theorem follows. 


Corollary 12-29: If u € &, then there is an ı = (r°, k!,...) € Q such 
that 
u = t-lim p”. 


n> oO 


Proor: The uniqueness of the integral representation implies that if 
B= prey, then A“({x}) = alfo: t-lim,,. wp" = p}) = 1. The 
desired ı may therefore be any configuration from a full w-set with 
respect to p. 

The entire development of this section has proceeded on the assump- 
tion that Y, is not empty. The problem of determining from the 
potential V just when Y, is not empty turns out to be a difficult one if 
the state space S is countably infinite. In the case where T is the 
integers, we shall have more to say about this later. When S is finite 
on the other hand, it will now be proved that Y, is always non-empty. 
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Theorem 12-30: If S is finite, then Y #4 @. Moreover, the limits 


pt ([K°,..., K™]) = lim max Be ([K°, - ..,K™)) 
+00 KeESk(n) 


and 


pw ([k?, .. ., K™]) = o min p"([x°,...,«™]) 


=o KkřesE™ 


exist for all m > 0, (x°,...,«™) e S4™, and there is phase multiplicity 
for V if and only if 


*([x°, 2.2, e™)) A wo ([k®,...,«™]) for some m and (x°,..., «”). 
H p 


(Recall that p*"([x°,...,«™]) is a certain characteristic which is com- 
pletely and uniquely determined by V.) 


To prove the theorem, we first need the following lemma. 


Lemma 12-31: For given u € Jy, m > 0 and fixed (x°,..., «™) €S4™, 
abbreviate 
pe = max p™”([k?,..., K"]), 
KresSE™) 
Ba = min p*"([k°,...,«™)). 
KE SE (n) 
Then 
(1) 0< Pn <$ pll? E n") s * for each n > m, and 


(2) wz is increasing and pt is dievas as n —> œ. 


ProoF: (1) u; is a minimum over a finite set of strictly positive 
probabilities, hence strictly positive. When n > m, 


p(x, 6.) = > we, Dale") 
pa >, Mli] = Ba 


iV 


and an analogous estimate establishes the remaining inequality. 
(2) For any n > mand x®*1e K(n + 1), 


P(E, K] = D alleo, oe] | Le", °t Del] | peet) 


> We, o, Pull] | e+ 
Ha 2 we(Le"] | ++] = py 


IV 
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This shows that u; is increasing with n; a similar estimate proves that 
pz is decreasing. 


Proor oF THEOREM 12-30: The limits u+ and u” are well-defined by 
the monotonicity established in the lemma. We now show that for 
any given configuration (x°,...,«™) on A(m), there is a random field 
Ey such that p([°,...,«™]) = wo ([k°,...,«7]). Let x” denote 
the configuration on K(n) for which the minimum value u; is attained, 
and define measures up, n = 1, on Q by 


Pa(le]) = wlia) whenever A c A(n — 1) 


pin: duale") = 1 
Pn({w: 2% = 0}) = 1 whenever t ¢ A(n) 


(The notation [:,] was introduced in the proof of Theorem 12-25.) 
These specifications are clearly consistent, so each p, is well-defined. 
Now by the finiteness of S, we can use a diagonal argument (like the 
one in Proposition 1-63) to choose a subsequence {up} such that 
un ([t4]) > (i 4]) for all possible configurations on every finite A € T. 
By the extension theorem, these cylinder limits give rise to a unique p 
on 9. Now observe that 


ulle, Ley K”]) = a tr (lr, ve K™)) = Pua BE" ([x°, seen K™)), 


the last limit being equal to u~ ([x°,..., «”]) by definition. To verify 
that pe Jy, we first note that the p, measures of [c,] are bounded 
away from 0 for n’ sufficiently large, since up([t4]) = p£” ([c,]) is strictly 
positive as soon as A c A(n’ — 1) and increases with n’ by Lemma 
12-31. It follows that p is strictly positive on finite cylinders, and that 
the neighbor Gibbs property is inherited from the u», (i.e., the limit 
may be interchanged with the operations defining the characteristics.) 
This completes the proof that Y, is always non-empty. By an analo- 
gous construction we find a neighbor Gibbs field fe Y, with 
p(k, ...,«™]) = wt([k?,...,07]). Ifu*t 4 wo (for some (x°,..., «)), 
then evidently |¥,| > 1. Ifu* = p`, then Lemma 12-31 shows that 
any u E [y is uniquely determined on all events [k°,..., «"], m > 0, 
hence on all [1,], finite A c T. This shows that |¥| = 1, completing 
the proof. 

Unfortunately, the general criterion for phase multiplicity just given 
is often difficult to apply, since u* and ~~ may not be readily comput- 
able. A more detailed theory is available for certain “attractive 
potentials,” examples of which will be mentioned in Section 6. 
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5. Homogeneous Markov fields on the integers 


Throughout this section the time parameter set T will be Z = the 
integers, N = {0,1,...} or —N = {..., —1, 0}, with the standard 
neighbor structure ð (i.e. dn = {n'e T: |n’ — n| = 1}). Our objective 
is to treat, in some detail, the classes of Markov fields on these infinite 
linear lattices. 

First let us consider the “one-sided” cases. We show here the 
previously mentioned fact that the Markov random fields on N are 
simply the Markov processes with strictly positive cylinders. 


Proposition 12-32: If T = N and p E y, then {x,} is a Markov pro- 
cess. In particular, there are probability measures 7, and transition 
matrices P,, = 0, such that 


Tri = Pr[z, = i], Pais = Pr[%n41 = J | Zr = i], 


and 7,P, = 7,41. Similarly, if T = —N and u E Y, then {z,} is a 
Markov process on — N, and there are probability measures 7, and 
transition matrices P,, n < 0, satisfying the above relations. 


ProoF: Without loss of generality, assume that V is normalized. 
It suffices to assume T = N and check the Markov property (the proof 
for T = —N being analogous). Fixn > 0,andset A = {1,2,...,n — 1}. 
For any 1 € Q, define 


Zalio, tn) = > exp 7 > V(t) p 
BCA:BONA#S 


Kesi 
where .* is the modification obtained by replacing the values of ı on A 
with those of k. Now choose a particular ı € Q, and let c’ be the con- 
figuration obtained by replacing the value i, at site 0 witha 0. Then, 
using the Markov field property at 0, we have 


Pileg = ig Ate = in) Ol) A) O HAW) 


Pr[zo = 0 A £n = tn] wale’) uA) we’) påle) 


Writing the characteristics according to Definition 12-20, the last term 


1S 
Zn(0, in) oe exp > Vale) 
BCA:BNA#S 


TORDE exp Ss Vale} 
BCA:BNA#E 
o Ün 


= eo) 


2,(0, ty) 


exp{Vioy(e) + Vio} 
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after cancellation. Hence 


Pr[xo = io A+++ A Zn = ty] 


AO) Pr[zxo = to N Ir = in] 


eas in)? exp Val) } Balt = fo A = i 


BcA:BNA#2 


2,(0, in)! exp{ Moo + Vale) b Paley = OA &, = in). 


BcA:BNA#S 
Finally, after further cancellation, we obtain 
Pr[z, = tn | to = to Acte N rn- = tn-1] 


= Feall nahranim] 


Z„(0, tn) 


a conditional probability depending only on n — 1, 7,_, and i„, which 
we may set equal to Pa-1,ip-iin 


Pr[xo = 0 A £, = ty] 
Pr[zo = 0 A a-i = ty-1] 


Proposition 12-33: Let T = N, and suppose that v € GY, has initial 
measure 7o and transition matrices P,. Then any u € Y, is Markovian 
with initial measure mọ and transition matrices P, which satisfy 
Tois = To0,ipho,ip) and Phising = P hin +1,in 4 D/A, in for some solu- 


tion h of the equations 


honin = > Printing n+ Loins 


in416S 


Nsinin +1 


Conversely, any p arising in this way is in G,. 


Proor: Apply the construction of the previous section with T = N 
and A(n) = {0,1,...,}. Then the correspondence of Theorem 12-25 
is clearly equivalent to the one asserted here, because v and u are both 
Markovian by Proposition 12-32. 


Of course, an obvious analogue of the last result holds in case T = 
—N. A concrete example of phase multiplicity on a half-line will be 
given later in this section. 

We turn our attention now to the “two-sided” case, T = Z. In 
contrast to the one-sided setting, we shall see shortly that there are 
Markov fields on Z which are not Markov processes. Since the neighbor 
structure ô on Z commutes with translation, it is possible to define a 
class of homogeneous Markov fields. 
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Definition 12-34: A Markov field {x,},. z is homogeneous if 


pi) = pmtn(@-%) for all m, neZ, eQ, 


where the value of 6-% at site t is 7,_,. A neighbor potential U is 
homogeneous if 


Uimh) = u; and Usa msiy(t) = Uj, Whenever im = J, îm+1 = K. 


Two facts follow immediately from the definitions. First, if {a,} is 
homogeneous Markov, then the canonical V of Theorem 12-21 is a 
homogeneous neighbor potential. Second, any neighbor Gibbs field 
with homogeneous neighbor potential U is homogeneous Markov. 
For such a U, if we set Q;, = exp{łu; + uy, + fup}, then u is a Gibbs 
field with potential U if and only if 


= a etit + Usk _ Qi Qik —(u, + u,/2 QQ ix 
pnts) = E ets tiig +Htge X QisQane~ i +2 (Q?) ix 
seS seS 
whenever im-1 = 1, im = j and im} = k. On the other hand, any 
strictly positive Q defines consistent local characteristics by means of 
the above equation, and these in turn give rise to a homogeneous 
neighbor Gibbs potential for p, according to Theorem 12-21. Thus we 
obtain a multiplicative representation for the characteristics in terms 
of Q which is more convenient than the one involving the canonical 
potential V. Let Y, denote the class of Markov fields determined by Q 
in this manner. An immediate requirement for GY, to be non empty is 
Q? < oo, and since the local characteristics determine all the character- 
istics, it follows from this assumption that Q” < œ for all n, so that 


A ( ) =a [m— in +1](,) = Qin — rin @inim +1" ae Qin stn inin+1 
Him, n\t) = Ptm,ni E (Qr-™*2) 
tm-atn41 


whenever m < n and [m — l,n + 1] € A. (Here [m,n] denotes 
{m,m + 1,..., n}.) 

We have seen that any homogeneous neighbor Gibbs field is in Gg 
for some Q. But many matrices Q’ give rise to the same character- 
istics, just as in the potential representation. By definition, either 
Gy = Gy Or Fo N Go = Ø; say that Q is equivalent to Q’ in the former 
case (Q x Q’ in notation). 


Proposition 12-35: Strictly positive matrices Q’ and Q” are equivalent 
if and only if 


for some c > 0 and strictly positive h = (hi)ces- 
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ProorF: The statements and proofs of Proposition 12-14 and Corollary 
12-15 are easily modified to apply to neighbor potentials on an infinite 
parameter set by replacing T with A. In our case, if U’ and U” are 
the homogeneous neighbor potentials such that u, = u% = 0 for all k 
and u, = ln Qi, uj, = ln Qu, then U’ and U” determine the same 
Gibbs fields if and only if 

(—1)!4-314,(.2) = 0 for every A = {m}, {m,m + 1}, 
BcAcDcA 
where 4, = U% — U% Setting ô; = uj, — uj, these equations become 

(1) ko + 80% — 280 = 

(2) 84; — ĉio — do; + Soo 
The combination (2) + 4(1 with k = i) + 4(1 with k = 3) yields 


(3) (wi, — uj) + w — w; — z = 0, 
where w, = (ôok — 8x0), Z = Soo. 


i,k,jes. 


Defining h, = e™x and c = e?, the desired equation in terms of Q’ and 
Q” follows. Conversely, if the hypothesis holds then 
QG ite = 0? Qiu lh/h)Qrlhylh;) Qi Qik 


Qc 0? Y QiglhehQaAFalhs) (QP) 
seS 


> 


so that Q’ and Q” determine the same local characteristics, i.e.,Q’ ~ Q”. 

The remainder of this section will be devoted to an analysis of Go, 
the class of homogeneous Markov fields on Z determined by a strictly 
positive matrix Q. We let & comprise the extreme measures in Go. 
It will now be proved, using the Martin boundary arguments of the 
previous section, that these extreme Markov fields on Z are always 
Markov processes. 


Theorem 12-36: If u € &, then {x,},-7 is a Markov process. Thus 
p is determined by measures 7, and transition matrices P,,, n € Z, where 
Tri = Pr[z, = +], Parij = Prin, = j l Za = t], 
and 7,P, = Tait: 
Proor: It suffices to check the Markov property. For n > 0, let 
Aln) = {—n, =n + 1,... n}. By Corollary 12-29, if weg, then 


p = t-lim,.... p°” for some (x°, x!,...)¢ Q. Thisimplies that there are 
states k, E€ S for n e Z such that 


Prix, = % A Zisi = tar Not A Em = tn] 
= lim Prlw, = i A 41 = tia, ACA Em = tm | 
n= œ 


t_»=k_,A «, = kn] 
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whenever l < m. Therefore 


Pr[%m = in| 2i = A tii = tar AtA Emi = tna] 


lim Pr[zn, = im | Ton = k_, A x, = ù A X44 = tad A 
n> o 


eA Em- = Îmi A Ep = ka). 
By the Markov field property, for all n sufficiently large the right hand 
side becomes 


lim Pr[£m = im | ton = kon A Baca = tect A £n = ka] 


n= 0 
finn Ea a Pind A Bo a a Pan A a I 
n= 0 Pr[am—1 = Veg | Ln = kon A Ln ae ka] 


= Pr[£m = im | Lm-1 = Imi] = Pg ota hes 


Next we present a useful representation theorem for the extreme 
homogeneous Gibbs states with matrix Q. 


Theorem 12-37: If u € fg, then there are strictly positive functions 
Ini) ANA hy, (n EZ, i e S) such that 


(1) 9n+isiney = > VinsigPinin s 1? 


ineS 
(2) A(n, ty) = > Qinins din + 1stn a0? 
in 16S 
and 
(3) > Inintnig = L 


ineS 


and such that the measures 7, and transition matrices P, for {x,} are 
given by 


Tri = Ian and Pa je = Vihansi o/h 


Moreover, there are constants c’,c” > 0 and k, eS (n E€ Z) such that 


Vii = C Em OP int (Orono m < O, imES 


hem, in = 0” lim (Q) nen (O orn m > 0, i, ES 
na O 


Proor: Since {x,} is a Markov process, 


— Qi@ro Paix Pn+1,n0 


Bnei) = (Q?);0 5 (PrPn+1);0 
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and 


nt+2 Se Qi0%o0 _ Pi+1,20Pn+2,00 
Bn +a) = 2 = 
Q )ko (Pa+1Pn+2)ko 


whenever i=j, tn41 =k, tate =îp3 =0. Let hay = 
(Q?)jo/(PnPn+1)i0> Cn = Qoo/Pn+1,00 Then dividing the first equation 
by the second and rearranging terms, 


Paik __} hari, 
Qik Cn+1 hin, iy 
Choose €,, n € Z, so that ĉo = 1 and @,_j/¢, = Cn. Now define ha, = 
nhon, i) to get 
Paix = Qha +1,0, 
as desired. Equation (2) follows immediately from the fact that P, is 
a transition matrix. If we define gj = 7,;/Aqn,), then (3) holds 
because 7, is a probability measure, while the equation 7,P, = 741 
implies (1). It therefore remains only to derive the representation of g 
and h as ratio limits of powers of Q. To this end, choose A(n) and k, 
as in the proof of the previous theorem. Then for m > 0, 
Pr[%m = tm | £o = 0] 
= lim Pr[£m = im [tog = k-n A £o = 0 A t, = k,] 


n= oO 


= lim Prt, = îm | £o = 0 A £, = ky], 


n= 


the last since {x„} is a Markov field. In terms of g, h and Q we have 


Jeo, oi em, im) lim 9.0,0)(Q™)oin( O”~ "inken Mn, en) 
9.0,0¥"%0,0) n= œ 90,0)” on Mn, ken) 
so that 


himin) = ho, lira (Tinen (Q" orn’ 
An analogous computation yields the result for g when m < 0. 


With the aid of the above representation, cases where |F| = 0, 1 
and oo will now be discussed. 


Example 12-38: Let S = Z, and consider any matrix Q of the form 
Qi, = qj- > 94,7 EZ, where > q = 1. 


iez 


Suppose that u € £o, with g and h the functions of Theorem 12-37. 
Equations (1) and (2) of that theorem say that g-m, and hem, Mm = 0, 
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are space-time harmonic for sums of independent random variables. 
By analogy to Example 6 of Section 10-13, one can show that the 
extreme such functions are of the form c™t' for some c > 0, t > 0 
(q; > 0 excludes degenerate solutions). Thus, 


Io,n ho,» = Í Í [stt ]dv(s)du(t) 


for some measures v and p on (0,00). It follows that 


œ œ œ œ 1 
2 90,0 = f i) £ + 2 (st)! + 2 gy aoet 
00, 


since one of the two infinite sums must diverge for any given s and t. 
This contradicts equation (3) of Theorem 12-37, so & is empty. Fo 
is therefore empty by Theorem 12-28. 

If Q > 0 is an ergodic transition matrix, then there is a unique 
probability measure « > 0 such that «Q =a. In this case GY clearly 
contains the stationary process with 


Tai = %4 and P,,,=Q, for allneZ. 


Thus, whenever Q’ ~ Q for some ergodic Q, then |Zo | # 0. Our next 
goal is to show that when S is finite fo. contains exactly one Markov 
field for any Q’ > 0, and that this field is a stationary Markov chain on 
Z. The first step is contained in a lemma. 


Lemma 12-39: If S is finite, then any Q’ > 0 is equivalent to a strictly 
positive transition matrix Q (with Q1 = 1). 


Proor: We show that there is a vector h > 0 such that Q’h = ch for 
some constant € > 0. Then, defining 


it follows that Q is a transition matrix, while Q’ ~ Q by Proposition 
12-35. To get h, let F = {h = (hijes: h = Oand J;esh; = |S|}, and 
define ¢ = sup{c:Q’h > ch for some he S}. Easy estimates prove 
that 0 < min, ; Qj, < Z < |S| max;,,Qj,; < œ. By definition of ¢ and 
Proposition 1-63, there are constants c™ and elements h™ of F, 
n = 0,1,..., such that c™ +2, Q'h® > cA™, and lim, h” = h 
for some he “ This implies that Q’h > th. Now if (Qh), > th, for 
some j, then Q’(Q’h) > cQ’h) so Q’(Q’h) > (€ + €(Q‘h) for small e. 
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But this contradicts the definition of ¢ once we normalize Q’h. Hence 
ch = Q’h > 0, which shows that h > 0. 


Theorem 12-40: If S is finite, then |o] = 1. Gg, consists of the 
Markov chain with transition matrix Q and stationary measure a, where 
Q is defined as in Lemma 12-39 and « is the regular probability measure 
for Q. 


Proor: Suppose p € fg, and let k,,n €Z, be the states in the ratio 
limit representation of Theorem 12-37 for the functions gm, and hem, 
Since S is finite, there is some state 7”¢S and an infinite sequence 
n” —> œ such that k,» = j”. Hence 


lim (Q=) 


“40 yn 
hem, in) = L 7E Z ce" -| = e"; 


lim ("Joz &zr 
n”> o 


by the convergence theorem for noncyclic ergodic chains. Thus h is 
constant. Similarly, there is some j’ and sequence n’—> —oo with 
kw = 9’, whereby 


lim (Q0) jan P 
g = o e Sy Sen 
i lim (Q) tm 
n'’> -o 


It follows that m, ; = œ; and Pp, = Q;,- In other words, u is uniquely 
determined as the process described in the statement of the theorem. 
For any strictly positive finite matrix Q’ with equivalent transition 
matrix Q, we therefore have |fọ| = |@,| = 1. Theorem 12-28 now 
implies |G,| = 1. 


Next we present a concrete example of a matrix Q with phase multi- 
plicity, for which all the elements of &ọ can be exhibited explicitly. 


Example 12-41: Let S = {0, 1,...}, and consider the strictly positive 
matrix Q given by 


Og = > oeg -k @ aj = minG, 7). 


Q may be thought of as describing transition from 7 to j particles in a 
population. First particles disappear independently with probability 
1/2, and then an independent Poisson distributed population of mean 
1/2 is added to those which remain. This interpretation makes it clear 
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that Ql = 1. We remark next that Q is ergodic, with regular measure 
a, =e 4/i!,¢e8. In fact, Q is a-reversible; 


ai; 


i 
inj 
-3/2 k t+j-k 
e 2 ( ) (4) 


il — k)! 


ene > (:) (BEE = oO, 
"=? LG k! 


since the terms in square brackets agree for every i,j and k. Thus %9 
contains the stationary Markov chain with m, = œ; and Pp, = Qy for 
every neZ. In order to determine the other elements of Yo, we first 
compute the powers ofQ. For this purpose it is convenient to introduce 
the generating functions y?(s) = Xo (Q"),;8’ (|s| < 1), iez. We 
claim: 


vi(s) = [1 — (g)"(1 — s) exp{—[1 — ("IQ — s)}. 


When n = 0 both sides are st; the result follows by induction from the 
following considerations. If we start with i particles at time 0, and 
S, denotes the number at time n + 1, then S,,, = X + Ss, Y,, 
where X and the Y, are independent, X Poisson with mean 1/2, the Y, 
taking on values 0 and 1 each with probability 1/2. Since X has 
generating function e~*~%?2 and each Y, has generating function 
(1 + s)/2, it follows from the formula for the generating function of a 
random sum that y?*1(s), the generating function of S,,,, equals 

e~4-9/24"((1 + s)/2) forn > 0. The desired formula for y?(s) satisfies 
this recursion relation. Hence (Q"),,, the coefficient of sê in the power 
series expansion of y?(s), is given by 


2"), = > iC) Wa = yy] fee eA — y = N. 


If u € fọ, then according to Theorem 12-37, the limits 
Jom = C lim (°) a/o MEZ, TES, 
n> 
exist and are strictly positive, for some c’ > 0 and fixed sequence k,, 


nez. We show next that these limits exist if and only if limp ~ o k_ ,/2" 
= @ for some 0 (>20). Under this assumption 


qa — (§)7+™)F~n —> @~ 91/2)" 


12-5 Homogeneous Markov fields on the integers 453 


and 


kon eee 6'(1/2)™ 
Co )arty + re o. 


Hence 


lim ("+"), sai 
n> œ 


> a 1/2)" eoar | emf — 1)!] 


i fi 
— e-(040/2" +1) mĮl j; 
e 2 (raa f/i! 


=~ 1/29" +1 (H(1/2)™ + 1il. 


We can take this last quantity to be gm,» by choosing c’-= e~®+. On 
the other hand, if limp» k_,/2" does not exist then we must have 
k_,,/2" > œ as n — œ, for otherwise the defining ratios would converge 
to distinct limits along different subsequences. But the ratio 
(Q"**),,_,i/(Q")_,0 for example, is a sum of positive terms including 


eA -Gi2"*h 777) — (1)yn +1] ea] — (4)" +4]! l \k-n 
oe RE sea) a aya l1 + — ĉi > 0, 
e~a-ai2) j| 1 — (4) i! gute 


and this last expression tends to œ as n —> œ ifk_,/2"-> œ. We have 
therefore shown that the limits defining g,,,,, exist if and only if 
k_,/2" > 0 € [0, œ) as n — œ, in which case gm, as a function of i may 
be taken to be Poisson with mean 6(4)" + 1. Since Q is a-reversible, 
a,(Q"),; = «,(Q"),, and hence the function hm, of Theorem 12-37 can 
be computed as 


(Q)ie m LOD Kent 
(m,i) n>o (Q'om, no %(Q")ino 


= c MeT N+M +. + 1) 


if k,(4)" > 7 € [0, 00); otherwise the limit does not exist. Condition (3) 
of Theorem 12-37 now dictates c” = e!~®. In summary, we have 
proved that if u € £o, then u is one of the Markov process measures 
Hon 9, €[0, 00). Here 0 determines g, 7 determines h, and then g and 
h determine py, according to Theorem 12-37. Finally, it remains to 
check that all the uo, are extreme, so that in fact fo = {u9n; 9, 7 €[0, 00)}. 
One first verifies, as in Example 6 of Section 10-13, that the topology 
of the Martin boundary B associated with YQ is the usual topology on 
R2 = {(0, n): 9 = 0,74 = 0}. Then by Theorem 12-28, 


Ban = Í Í Herd A(9, 1) 
0) 0 
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for a unique probability measure A on R} such that A({(9, 7): pon E &}) 
= 1. Evaluating both sides on {w | w, = i, wn+1 = j}, we derive the 
equation 


e- Pe- EaD" + D(F)" + 1i e7 02t + (ggn+ re 1/5} 


=~ T ik e~e- 81/2)" +104)” + 1%/i!} 
o JO 


x f{e7M2*24+D(pantd 4 1) /jBAA(O, 7). 


Multiply both sides by utv’ (u, v € R), sum over i, j € S, and interchange 
summation and integration, to obtain 


e” dite — 6(1/2)"(1 “0 {e -2+1 - D 


= T T e~ nfe- RO fete GAO). 
0 0 


If we make the change of variables: x = (1 — u)/2", y = 2"*1(1 — v), 
then for x > 0 and y > 0 the right hand side is the double Laplace 
transform of the measure e~®"dA(0, n). Since the equation is satisfied 
when à concentrates at (8,7), the uniqueness theorem for Laplace 
transforms implies that A must be this measure. We conclude that 
laz E fo, as desired. 

Whenever Q is a transition matrix and p € fọ admits a representation 
with ha, = 1, then m,i = 9m, and P, = Q. A family of probability 
distributions 7,, n € Z, such that 7,Q = 7,,, is called an entrance law 
for Q. For example, the Poisson distributions 7, with mean 6($)" + 1 
for fixed 0 €[0, œ) constitute an entrance law for the matrix Q of the 
last example. The case 0 = 0 yields the stationary Markov process 
with regular measure a; when 6 > 0 the process {x,} with measure uon 
“comes down from infinity” in the sense that lim,.._,. Pr[z, = i] = 0 
for anyieS. A more surprising type of entrance law is described in 
Problem 8 at the end of the chapter. 


The matrix Q of Example 12-41 may also be used to illustrate phase 
multiplicity on the non-negative half-line. Namely, for y € [0, œ), let 
Ain) = e72" + 1), and define 73, = hlo, P7, j= Qin Mn + 1,40/ Ma 
If we set 7) = a, P, = Q, m = wh and P, = P? in Proposition 12-33, 
then the hypotheses there are clearly satisfied. Thus we have con- 
structed a large family of one-sided Markov fields with the same 
characteristics as the Markov chain with transition matrix Q and 
initial distribution «. 

As a final application of Example 12-41, we exhibit an element of G 
which is not a Markov process. Consider the field {x,} given by the 
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convex combination u = $409 + $411. Clearly u E Gg, and to see that 
p is not the measure for a Markov process it suffices to check that 
Pr[z, = 0 | % = 0A x, = 0] # Prix, = 0 | x, = 0]. 
The z, and P, for u,, satisfy 79 = e~t, mio = e7’, Poon = e71Qoo 
and Pi ,o00 FE €~*Qoo- Thus 
3(€~*Qoo@oo) + #(e7 *€ *Qooe~ Moo) 
3(€~*Qoo) + $(e7te Roo) 


1+e6 
= Ty oni Goo 


while 


3(€7*Qo0) + $(€7 %7e~*Qoo) 
lel + fe~? 


Pr[z, = 0 | zı = 0] = 
= l + e7 11/2 


o Len Voo 


Hence {x,} has the two-sided Markov property, but not the one-sided 
Markov property. 

We conclude this section by mentioning without proof the deepest 
result to date in the theory of denumerable Markov fields on Z. When 
T has a group structure it is natural to consider the class 49, consisting 
of all those pE Y, which are invariant under translations. The 
following theorem completely determines the possibilities for 4%., the 
translation invariant Markov fields on T = Z with characteristics deter- 
mined by the strictly positive matrix Q’. 


Theorem 12-42: If p E fg, then Q’ x Q for some strictly positive Q 
such that Q1 < 1. If Ql = 1 and Q is ergodic, then |93.| = 1. In 
this case the unique member of 4%, is the stationary Markov chain on Z 
with transition matrix Q and 7, = the Q-regular measure «œ. In all 
other cases 93, = ©. 


6. Examples of phase multiplicity in higher dimensions 


When T = Zê for d > 2, the conclusion of Theorem 12-40 ceases to 
hold. In other words, there are instances of phase multiplicity for 
homogeneous potentials U with S finite. This phenomenon is un- 
doubtedly the most important in the theory of random fields, but an 
adequate treatment is far beyond the scope and purpose of this chapter. 
Instead, we briefly discuss two examples. Suggestions for further 
reading are included in the Additional Notes. 
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Example 12-43: Tree processes. S = {0,1,..., N} and T is a count- 
able tree (i.e., J’ is endowed with a neighbor structure ô which defines a 
connected graph with no loops). Let P be a strictly positive N-state 
transition matrix with regular measure «, and assume that P is a- 
reversible. The random field {x,} on ST is called a tree process for 
(æ, P) if u is defined by the following three properties: 


(i) Pr[z, = i] =a, forallieS,teT; 


(ii) Prize = % A ti =i A+++ A Ey = 4) = % Pais +s Pau, 
whenever {to, t;,..., t} is a finite path in the tree T and ig, i;,..., 
ùes; 


(iii) Priz, = io Atti A %, = ù | z, = i, for all r e A] 
= Priz, = to AeA Ti = ù | th = io] 
for any finite A = T and path {to,..., t} < T which intersect 
only at site tọ, and any tio, ..., 7, ES. 


For given « and Q, conditions (i)-(iii) determine a well-defined and 
unique Markov field. According to (i) and (ii) the process behaves like 
a reversible Markov chain along paths (reversibility ensures that 
cylinder probabilities are independent of the direction we travel along a 
path in (ii)), while paths “patch together” because of condition (iii). 
As a special case, suppose that S = {0, 1} and T is the tree with three 
neighbors for every site (sometimes called the 3-Bethe lattice). Con- 
sider the tree processes with measures p’ and u” induced by («’, P’) and 
(«”, P”) respectively, where 


d=) P= 


— 
Ol ow 
olen eom olen coe 
— 


— 


8 
a” T (4 4) P” R ( 
= ; = 


It is not hard to verify that both fields have the same local character- 
istics, so there is phase transition for the potential V corresponding to 
these characteristics. For instance, 


u'({£a = 0} | {x, = 0 for all t e da}) = OG = 33, 
i (3)(3)? + (3)(8)° 
(5)(8)° 32 


u" ({£a = 0} | {x, = 0 for all t € da}) = e+ oa? = 3. 
Es 


In this setting, ifp=('7? P ) then « = ( q =Z) and 
q l-q pt+gdpt+a 


P is always a-reversible. The (a, P)-process is called attractive when 
p +q < l. Roughly, attractiveness means that a 1 at site ¢ increases 
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the likelihood of 1’s near ¢ (and similarly for 0’s). More precisely this 
implies that in Theorem 12-30 p*"({x, = 1}) is maximized when x” 
consists of “all 1’s” on K(n). Using this fact it is possible to show that 
in the attractive case there is phase multiplicity for the potential corre- 
sponding to (a, P) if and only if (p — q} — 2(p + gq) + 1 = 0 and 
(p,q) # (4,4). The (a, P)-process is repulsive when p + q > 1, and 
in this case one can prove by entirely different methods that phase 
multiplicity occurs if and only if p + q > 3. 

Random fields on countable trees T have the advantage that most 
physically meaningful quantities can be computed explicitly; the fact 
that T has no loops enables one to use inductive methods. The natural 
setting for statistical mechanics is T = Z*, however, and in this case 
the theory is immensely more difficult. We summarize some of the 
leading results for the simplest Markov fields on the two-dimensional 
integer lattice in our last example. 


Example 12-44: Two-dimensional Ising model. S = {0,1}, T = Z?. 
V is a normalized potential of the form: 


Viale) = vo Via,oy(+) = v 


whenever |a — b| = 1 and i, = i, = 1, with V,() = 0 in all other 
cases. V is attractive if v, > 0, repulsive otherwise; the intuitive 
interpretation is the same as in the previous example. When V is 
attractive there is phase multiplicity if and only if vp + 2v, = 0 and 
vı > 2In(V2 + 1). For repulsive V there is phase multiplicity in an 
open neighborhood of the line segment {(v9, v1): vo + Zw, = 0 and 
v, < K}, with K sufficiently negative. Similar results hold in higher 
dimensions, though less is known. 


7. Problems 


1. Show that if S = {0, 1}, T is a finite subset of Z*, and V is a normalized 
neighbor potential, then the energy H, of V may be expressed as 


Ay) =} DD rately += {ù} € 9, 
aeT bea 
for some v = {Wa € R, |a — b| 4 1} satisfying Vas = Voa- 

2. Give an example of a finite Gibbs field which cannot be represented in 
terms of a pair potential. 

3. Let ô be a given neighbor system. The K-neighbor set 6%a (K = 1,2,...) 
of ae T is defined recursively by ôta = da, 0'a = G, and for K > 1, 
ak +1q = 0(0*a), OX +1a = (AX +a) U (Fa). Thus a K-neighbor t of a is a 
site which can be reached from a in K steps to neighboring sites, and no 
fewer. A random field {x,} is called a K-Markov field (with respect to ô) 
if p4 = p2"* whenever a c A c T, A finite. A potential U is called 
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a K-neighbor potential if U, = 0 whenever A contains two sites which 
are not K-neighbors. Show that {z,} is K-Markov if and only if the 
canonical potential V for {x,} is K-neighbor. What does this say when 
K = 0? [Hint: Define a new neighbor system.] 


. Suppose that there is a metric d defined on T, and say that {x} is L- 


Markov (L > 0) if p4 = BD whenever Bia, L) = {te T: d(a, t) < D} 
cAcT. What property for the canonical potential V is equivalent 
to the L-Markov property for {x,}? State and prove a theorem to 


justify your assertion. Describe the \/2-Markov fields on Z?. 


. Show that if {a,} has any neighbor potential U, then {z,} is a Markov field. 


Give an example of a Markov field with potential U, such that U is not 
a neighbor potential. [Hint: For the first part, look carefully at the 
proof of Theorem 12-16.] 


. Let T = Z. Prove that GY, = @ if 


(Q”)is(Q") 


m sip. ny 0 for some j ES. 


[Hint: Show that this condition forces Pr[a, = j] = 0.] 


. Give an example of a Markov process {x,},¢z which is not extreme in its 


class of Markov fields. [Hint: Use the matrix Q of Example 12-41.] 


. Let S = {0,1,...}, T =Z. Define œ inductively by a, = 4, and 


a; = (œ -1//3) + 4(4} fori > 1. Let 
ay +=0,jeE8 
a (ha + $8-ullalu) +> 1 jes 
Finally, put 


œ 


a + (Sc-n-1x — %) II (%,/3e,.41) n <0 
-1 . 


Tri = k=—-n 
a; n > 0 


Show that «œ and all of the z, are strictly positive probability vectors on 
S, and that Q is a strictly positive transition matrix with Ql = 1. 
Finally, prove that «Q = a, 7,Q = m,41, and m, Æ aœ forn < 0. Thus 
Q has an entrance law which agrees with the stationary one from time 
0 on, but not before time 0. 


. Let V be any neighbor potential on Z. Show that if u € &, then {z,} 


is a Markov process. 


Suppose g and h satisfy (1)-(3) of Theorem 12-37 for some Q > 0. 
Show that 7, ; and P,, ,,as prescribed in that theorem give rise to a well- 
defined field {x,}. Is {z} in GY? Is it in ég? 


NOTES 


Chapter 3: 


Stochastic processes with the martingale property were first studied by 


Lévy [1937]. Lévy considered, in Sections 67 to 70, partial sums of sequences 
{ fay such that 


M[fn+1| fo AtA fal = 0. 


These are a natural generalization of sums of independent random variables 
with mean 0. He proved theorems such as a central limit theorem suggested 
by comparison with sums of independent random variables. Ville [1939] 
recognized the importance of studying processes representing a fair game and 
for which system theorems should hold. He called these processes martin- 
gales. Although he did not prove any convergence theorems, he did prove 
the inequality given in Problem 7. From this he was able to conclude that 
non-negative martingales had finite lim sup with probability one. He made 
application of this to the study of sample paths of coin tossing. In par- 
ticular, he proved one half of the law of the iterated logarithm. The basic 
convergence theorem, Theorem 3-12, for martingales was proved by Doob 
[1940]. In his book on stochastic processes, Doob [1953] introduced sub- 
martingales (called semi-martingales in that book) and made a systematic 
study of the system theorems and convergence theorems for these processes. 
The proof of Proposition 3-11 for martingales is due to Doob [1940]. The 
proof given here and the extension to submartingales is due to Snell [1952]. 
Additional applications of martingale theory to Markov chains may be found 
in Lamperti [1960a] and [1963a]. 


Chapter 4: 


Markov chains with a finite number of states were introduced by Markov 
[1907]. Kolmogorov [1936] considered the case of a denumerable number 
of states. Important contributions in the foundations of Markov chains 
were made by Doeblin [1938]. There are a number of books devoted to the 
study of finite Markov chains. Among these are Fréchet [1938], Romanovskii 
[1949], Kemeny and Snell [1960], Lahres [1964], and Gorden [1965]. The 
theory of denumerable Markov chains is the subject of a book by Chung 
[1960]. 

Finite random walks have been analyzed in some detail in Kemeny and 
Snell [1960], Chapter 7. See also Kac [1947a]. The books by Spitzer 
[1964] and Kemperman [1961] give detailed studies of Markov chain problems 
applied to sums of independent random variables. The class of random 
walks discussed in Example 8 was introduced by Karlin and McGregor 
[1959] who made an extensive study of these processes. There is a large 
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literature on branching processes. References to this literature as well as an 
account of the theory of these processes may be found in a book of Harris 
[1963]. The process called the basic example in this book is often referred 
to as a “renewal process.” 

The recognition of the need for and the importance of system theorems is 
due to Doob. See Doob [1953], Chapter VII. The strong Markov property 
which holds for any denumerable Markov chain does not hold for general 
Markov processes where time is allowed to be continuous and the state space 
is the real line. For a discussion of this problem in the more general setting, 
see Blumenthal [1957]. 

A discussion of system theorems and a rather systematic use of these 
theorems in Markov chain theory may also be found in Chung [1960]. 


Chapter 5: 


Kemeny and Snell [1960], Chapter III, showed that the fundamental 
matrix N could be used to obtain moments of many descriptive quantities 
for finite absorbing chains. The extension of this use of N to denumerable 
chains was made in Kemeny and Snell [1961b]. Theorem 5-10 is the analog 
of the Riesz Decomposition Theorem for superregular functions. A 
systematic discussion of results of this type which exploit the analogy 
between superregular functions for a Markov chain and classical super- 
harmonic functions may be found in Feller [1956] and Doob [1959]. The 
proof of Proposition 5-20 is due to Dynkin and Malyutov [1961]. 

Proposition 5-22 is true even if we drop the hypothesis of finitely many 
k-values. This theorem is due to Chung and Erdos [1951], and a simplified 
proof of this result was given by Chung and Ornstein [1962]. 


Chapter 6: 


Theorem 6-9 is due to Derman [1954]. He proved existence by showing 
that a; = 'N,; is a regular measure. His proof of uniqueness is less elemen- 
tary than ours and uses the Doeblin ratio theorem applied to the chain 
reversed by «œ. This ratio theorem proved by Doeblin [1938] states for 
recurrent chains that lim,... (W{?/N{?) exists and is independent of i and k. 
Derman also used the identification of this limit as a,/«,. 

Proposition 6-24 is due to Kac [1947b]. 

Doeblin [1938] proved Proposition 6-32 and then applied limit theorems 
for sums of independent random variables to obtain limit theorems for 
Markov chains. For details of this technique and resulting limit theorems, 
see Chung [1960], pp. 75-106. The converse to Proposition 6-32 is due to 
Yosida and Kakutani [1940]. 

The fact that lim,_..,. P” exists for a noncyclic recurrent chain is due to 
Kolmogorov [1936], [1937]. The extension of this result given in Theorem 
6-38 is due to Orey [1962], and the first proof (including Lemmas 6-36 and 
6-37) is a somewhat simplified version of his proof. The proof using the 
Renewal Theorem may be found in Feller [1957]. Theorem 6-43 is new. 


Chapter 7: 


The details of the connection between Brownian motion and classical 
potential theory are discussed by Knapp [1965]. The recognition of the 
importance of identifying these two theories started with Kakutuni [1944]. 
Doob [1954] made significant extensions of the results of Kakutani by further 
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identifying martingale and submartingale theory with the theory of harmonic 
and superharmonic functions. Important contributions again exploiting 
connections between Brownian motion and classical potential theory were 
made by Kac [1951]. The next major contribution was made by Hunt 
[1957], [1958]. Hunt showed that one could develop a potential theory for 
essentially the most general Markov process. He considers continuous 
time and abstract state space. He showed conversely that under rather 
minimal requirements for a potential theory one can construct a Markov 
process associated with this theory. His work related to the potential 
theory that goes with transient processes. 

Although many of Hunt’s results go over easily to the Markov chain case, 
even for transient chains new problems arise, and a whole new theory must 
be developed for the recurrent case. These extensions were made by 
Kemeny and Snell [1961b] for general Markov chains and by Spitzer [1962] 
for the important class of Markov chains which arise from sums of lattice- 
valued independent random variables. 

The fact that the symmetric random walk in one and two dimensions is 
recurrent, whereas in dimension three or greater it is transient was first 
proved by Polya [1921]. The proof of Proposition 7-10 was supplied by 
Lamperti. 


Chapter 8: 


The notion of h-regular function was introduced into the study of potential 
theory by Brelot [1956] and the corresponding idea of a function regular in 
the h-process for chains was discussed in Feller [1956] and Doob [1959]. 

A discussion of equilibrium potential, equilibrium charge, and capacity as 
they arise in electrostatics and Newtonian potential theory may be found in 
the book of Kellogg [1929]. A somewhat more modern approach may be 
found in Brelot [1959]. In the classical theory the Green’s function which 
plays the role of the matrix N is always symmetric. The fact that there is an 
interesting potential theory even for nonsymmetric operators in probability 
was first shown by Hunt [1957], [1958]. 

The results of Sections 1 and 2 of this chapter were for the most part in 
Doob [1959]. Those of Sections 3 and 4 are specializations to the Markov 
chain case of results obtained by Hunt [1957], [1958] for more general 
Markov processes. 

Choquet and Deny [1956-57] investigated the problem of the relation 
between the various potential principles for the case of potentials of the form 
g = Gf, where G is an arbitrary non-negative finite matrix. If G has an 
inverse they proved that the Principles of Balayage and Domination are 
equivalent and that each implies the Principle of Lower Envelope. Here 
every non-negative function f is a charge. They showed further that if G 
satisfies the Principle of Lower Envelope, then there is a unique permutation 
of the columns of G such that the resulting matrix satisfies the Principle of 
Balayage. Also they showed that G satisfies the Principle of Balayage if 
and only if it is of the form G = A >9_,S?, where A is a diagonal matrix 
with strictly positive diagonal entries and S is a non-negative matrix. Thus 
the most general operator here is only slightly more general than the class of 
all matrices of the form N = (IJ — Q)-, where Q is a finite transient chain. 
Some investigation of this problem for denumerable matrices was made by 
Kemeny and Snell [1961b]. 
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The notion of energy seems to be significant only in the case of reversible 
chains and even here it does not have the nice probabilistic interpretations 
that the other potential theory concepts have. 

The results of Section 8 are taken from Kemeny and Snell [1961b]. The 
idea of developing a potential theory for supermartingales was suggested by 
Doob [1961] where he also indicated the proof of Proposition 8-79. 


Chapter 9: 


The potential theory for recurrent chains discussed in this chapter was 
introduced by Kemeny and Snell [1961b]. 

The existence of the limit in Theorem 9-4 was first proved by Doeblin 
[1938]. The identification of the limit was made by Chung [1950]. The 
present proof is from Kemeny [1962]. 

Theorem 9-7 was proved under a mild assumption in Kemeny and Snell 
[1961b]. The case i = j was proved in general by a method due to Chung in 
Chung [1961] and Kemeny and Snell [1961c]. The general case was proved 
by Kemeny [1963]. 

The fact that all ergodic chains are normal follows from Theorem 4, 
Chapter 1, Section 11 of Chung [1961]. The remaining results in the first 
three sections are taken primarily from Kemeny and Snell [1961b]. 

Proposition 9-65 was first proved by Lamperti [1960b]. Results of the 
form of Propositions 9-67 and 9-68 may be found in Chung [1960] Chapter 1, 
Section 11. 

The notion of strong ergodic chains introduced here is new. The matrix Z 
was introduced in Kemeny and Snell [1960]. It was shown in this book that 
the matrix Z for finite ergodic chains could be used to express the moments 
of many interesting descriptive quantities and hence played for recurrent 
chains a role similar to the matrix N for finite absorbing chains. 

All sums of independent random variables processes which form aperiodic 
recurrent Markov chains are normal. This was proved by Kemeny and Snell 
[1961a] for the case of finite variance and in general by Spitzer [1962]. 

The operator K was introduced by Kemeny and Snell [1963b]. Most of 
the results of Sections 8 and 9 are taken from this paper. 

The method of associating denumerable chains with electric circuits 
discussed in Section 10 was carried out by Nash—Williams [1959] under 
slightly more restrictions on the chain than we impose. He proved also 
Lemma 9-129. 


Chapter 10: 


The Martin boundary for Markov chains was introduced independently by 
Doob [1959] and Watanabe [1960a]._ Doob and Watanabe used the methods 
which were developed in the study of the classical Martin boundary relevant 
to Newtonian potentials. Details of this approach may be found in Brelot 
[1956] and Doob [1957] or Watanabe [1960a]. 

Hunt [1960] gave a new and more probabilistic treatment of Martin 
boundary theory for Markov chains and completed the work of Doob and 
Watanabe in several ways. In particular, he introduced a new class of 
processes called approximate P-chains. These are slightly more general 
than the processes we have called extended chains. Our treatment of the 
Martin boundary is for the most part a rewriting of Hunt’s paper with more 
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detail supplied, except that we have used a slightly different definition of 
boundary than that listed by Hunt. The difference is as follows. Doob 
introduced a metric on the state space, the one we have used if 7 assigns all 
its weight to one point, and completed the state space in terms of this 
metric. A point was called a boundary point by Doob in the completed 
space if it was a limit point of the original states. It is possible for one of 
the original states to be such a limit point. To avoid this peculiarity, Hunt 
modified Doob’s metric slightly to make such a point into a new point. 
Since these new points are always nonminimal points and appear to play no 
essential role in the theory, we have followed Doob’s metric. However, we 
have chosen to call the boundary simply the new points added by the 
completion. The observation that mN > 0 is the only condition on a that 
is needed appears in Orey [1964]. 

One can also use G. Choquet’s theory of convex cones to develop Martin 
boundary theory. This approach has been carried out by Neveu [1964]. 
See also Hennequin and Tortrat [1965]. 

Brelot [1956] showed that the Martin boundary was ideally suited to the 
study of the first boundary problem, or the Dirichlet problem. His approach 
was to generalize the method developed by Perron and Wiener (see Kellogg 
[1929]) for regions in Euclidean space. 

The probabilistic approach to the first boundary problem was first sug- 
gested by Kakutani [1945] and done more generally using the Martin bound- 
ary by Doob [1958]. The method presented in this book is the probabilistic 
approach of Kakutani and Doob. 

The discussion of fine boundary limits follows that of Doob [1957], who 
considered these problems for superharmonic functions using Brownian 
motion theory. 

The Martin boundary has now been worked out for several important 
classes of Markov chains. In particular, Doob, Snell and Williamson [1960] 
have worked out the boundary for general sums of independent random 
variables. Related results may be found in Dynkin and Malyutov [1961]. 
There are close connections between classical moment problems and the 
Martin boundary for certain of these processes. A discussion of this point 
may be found in Watanabe [1960b]. Lamperti and Snell [1963] discussed 
the Martin boundary for the class of random walks introduced by Karlin and 
McGregor [1959]. This discussion was generalized by Kemeny [forthcoming]. 
Finally Blackwell and Kendall [1964] have given a discussion of the Martin 
boundary for the Polya urn scheme. 

The result in Example 3 that the only positive regular functions for the 
symmetric random walk in three dimensions are the constants was proved by 
Murdoch [1954] by other methods. Murdoch obtained a better estimate of 
No, than that given here. The short proof of the estimate for Ny, that we 
give was supplied by E. Stein. 

The results in Problems 30 to 34 were discovered by Harris [1957] and 
Veech [1963]. 

A point x of S* is regular for the Dirichlet problem if for each continuous 
function f > 0 on S* the superregular function h with f as boundary values 
has lim,_,, h(j) = f(z). An equivalent condition on v is that for each open 
neighborhood U of x, lim,» Pr{z,¢S* — U] = 0. Knapp [1966] showed 
that the set of regular points is a Borel set and gave an example of a chain P 
with P1 = 1 for which the set of regular points was empty. 
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Chapter 11: 


The main results of this chapter were presented in a paper by Kemeny 
and Snell [1963a]. As noted in this paper, these authors are indebted to 
W. A. Veech for the important Lemma 11-11. The recurrent boundary was 
introduced independently by Orey [1964]. 


ADDITIONAL NOTES 


The discussion below deals with some of the developments in the theory 
since the publication of the first edition. Some papers that predate that 
publication are mentioned to put matters in context. Citations point to the 
Additional References except when the bracketed date is followed by “R.” 
The latter citations point to the References section. 


Chapter 12: 


The foundations for the theory of Markov fields and random fields in 
general were developed by Dobruschin (e.g., [1968]) in a series of papers. 
Our treatment of finite random fields and the equivalence theorem for 
Markov and neighbor Gibbs fields, as presented in Sections 2 and 3, is based 
on Griffeath [1973]. K. L. Chung and D. Dawson made helpful improve- 
ments in the presentation. Theorem 12-16 is due in essence to Averintsev 
[1970], though he considered only the case T = Z°. A series of papers, 
culminating in Grimmett [1973], exploited the Möbius inversion formula to 
obtain simpler proofs in a more general context. The Martin boundary 
approach to infinite Gibbs fields is due to Follmer [1975a], and is based on 
the work of Dynkin [1971]. Their setting is far more general than the one 
presented here, so their arguments are not as elementary. The detailed 
study of countable Markov fields on Z was initiated by Spitzer [1975a]; much 
of Section 5 is based on his paper. Fdllmer [1975b] has also treated this 
subject. The proof of Proposition 12-32 was supplied by H. Kesten. 
Theorems 12-36 and 12-37 were obtained by Spitzer using tail fields rather 
than the Martin boundary approach. The ratio-limit representation of 
Theorem 12-37, which does not appear in Spitzer’s paper, is cited by Cox 
[1976]. Example 12-38, due to Spitzer, makes use of Doob, Snell, and 
Williamson [1960R]. The important Theorem 12-40 was discovered by 
Dobruschin [1968]. Example 12-41 is a special case of a family of phase 
transition examples discussed by Cox [1976]. | The remarkable Theorem 
12-42 is due to Kesten [1976]; the reader is referred to his paper for the proof. 
Spitzer [1974] gives a very nice exposition of many aspects of random field 
theory not discussed here. Another useful reference is Dawson [1974]. 
Tree processes were first studied by Preston [1974], whose book contains 
a wealth of information on random fields. A more recent reference is 
Spitzer [1975b]. A lucid exposition of the Ising model may be found in 
Griffiths [1972]. Problems 7 and 8 are derived from Spitzer [1975a]; Problem 
9 is based on a construction of S. Kalikow [1976].._ 


Corrections to the first edition: 


Pitman [1974] pointed out that Theorem 9-53 was stated incorrectly in 
the first edition. It had stated incorrectly that g = —Gf, overlooking the 
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possible failure of the relation A(°Nf) = (A °N)f. This theorem and its 
consequences have been corrected in the second edition. 

For general ergodic chains we can give no useful condition under which this 
associativity holds. However, for a strong ergodic chain associativity holds 
if f is bounded (and in particular if the potential g is bounded, since f = 
(I — P)g). In fact, it is enough to observe that a°N1 < a°N1 = Myo < 


oo. Thus for a strong ergodic chain, g = —Gf if the charge f is bounded. 
This conclusion was obtained in a more direct way in Proposition 9-73. 
Martingales: 


Two books that develop the subject of martingales are those by Meyer 
[1972] and Neveu [1972a]. Both books begin with the basic material on 
martingales. Neveu’s contains a short chapter on the optimal stopping 
problem that is discussed later in these notes. The latter part of each book 
begins to reflect the explosion in the subject of martingales that has centered 
around integral inequalities. 

Early developments were by Burkholder [1966] and Gundy [1967]. Sup- 
pose (fa F) is a martingale and d, is the sequence of differences dy = fo, 
da = fa —fn-1 for n > 1, so that fa = do +--+ da. If v, is measurable 
with respect to F,_,, then the sequence g, with 

n 
In = > vedy 
K=0 
is called a transform of f,. It is a martingale if M[|g,|] < œ for all n, by 
imitation of the proof of Proposition 3-7. (The case that the v, are charac- 
teristic functions arises in the proof of the Upcrossing Lemma and is the case 
of optional sampling.) Burkholder and Gundy deal with questions of 
convergence and integral boundedness of such transforms. The gambling 
interpretation is as follows: A gambler playing a sequence of rounds in a fair 
game can win d, dollars in round n, and his fortune is then f,. If sup M[|f,|] 
< œ, his fortune converges to a finite limit f,, a.e. and M[|f..|] < sup M[|f,/]. 
In the transformed game he is allowed to vary the stakes according to his 
past experience; at time n he can win v,d, dollars. How can he improve his 
circumstances by choosing v, suitably? Burkholder’s first theorem is that 
if sup, |v,| < œ a.e., then g, converges a.e. to a finite limit go; however, 
M{\|g.|] may be infinite. 

In studying a martingale {f,}, Burkholder and Gundy work with the 
function (721 |fn — fa-1|?)? and generalizations. This is called the S- 
function; some of its properties of convergence and average size are com- 
parable with those of {f,}. The prototype for such conclusions is the 
Khintchine-Kolmogorov Theorem: Let {y,} be independent random variables 
with M[y,] = 0 and M[|y,|?] = 02. If 9.102 = o? < œ, then S719, 
is convergent a.e. and in L*. Conversely if X y, is convergent in L?, then 
2 o? = o? < œ and M[|2'y,|7] = o?. There is a corresponding martingale 
result with fa — f,-1 in place of y,. 

There is a parallel between the theory of martingales and some of the 
developments in Euclidean Fourier analysis, and some of the theorems in 
each of the areas motivate theorems in the other. Let Q, be the unit cube in 
n dimensional space R”, and let Z, be the partition of Q, consisting of cubes 
of side 2-* with all coordinates of vertices at integral multiples of 2°>*. Now 
let (f, At) be a martingale; for example, fy = M[f|Z¥] is a martingale if f 
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is integrable on Qo, and there are other examples. Meanwhile, consider 
harmonic functions uw in R%*! = {(z, t) | zeIR*,t > 0}; for example, the 
Poisson integral of an integrable function on Q, (or in all of R”) is an example, 
and there are other examples. The parallel is obtained by comparing 
properties of the function f, in the special martingale with the function 
u(-, 2-*), where u is harmonic. 

This parallel was handled rigorously in one case when the martingale 
theorem of Burkholder and Gundy [1970] was generalized by Burkholder, 
Gundy, and Silverstein [1971] to deal with Brownian motion and then to 
prove a Euclidean theorem. 

Except in this one case, however, the idea has been to make the comparison 
and to proceed by analogy. Under the parallel, the martingale S-function 
corresponds to the “ Lusin area function ” of Fourier analysis if the martingale 
(frs Ae) is general, and to the “Littlewood-Paley g-function” of Fourier 
analysis if the martingale has f, = M[f jal The parallel also gives useful 
information in dealing with functions of bounded mean oscillation. For an 
account of the martingale results, see Garsia [1973]. Fefferman [1975] has 
given a thorough exposition of the Euclidean results and described the 
parallel in more detail. 


Strong ratio limit property: 


A noncyclic recurrent chain P has the strong ratio limit property (SRLP) 
if there are positive numbers 7; > 0 such that 


(n +m) 
lim 28 z 
n= o PY Tı 


for all i, j, k,l, m. Chung and Erdos [1951 R] showed the SRLP holds for 
sums of independent random variables on the integers, and an example 
reproduced in Chung [1960 R] shows the SRLP fails for a certain non-cyclic 
recurrent P. 

Orey [1961] proved the SRLP holds if P%* )/P& tends to 1 and it holds if 


Pinn+)) 
lim sup —22—_ < 
n> o P Pon 


The latter condition holds for a reversible chain since P% is non-increasing 
in n, and hence the SRLP holds for reversible chains. 

The result of Chung and Erdos can be interpreted as showing the SRLP 
holds if there is spatial homogeneity of the right kind. Kingman and Orey 
[1964 R] proved the SRLP under a much weaker assumption of spatial 
homogeneity—that N{P > 1 + e for alli for some e > Oandsomen. Sums 
of independent random variables clearly have this property. 

More recent work has concentrated on transient chains. The SRLP 
requires reformulation in these cases. Pruitt [1965] gives a definition and 
proves theorems analogous to those of Orey [1961]. The book by Orey 
[1971] treats these matters in some detail. See also Freedman [1971]. 


Applied uses of Markov chains: 


Probabilistic functions of finite Markov chains: Let P be a finite Markov 
chain with state space S and starting vector m. Suppose, for each i in S, 
that F,. is a probability measure on a finite set Y. We imagine a process in 
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which P takes place unseen in the background and the matrix F is used at 
each time to produce an outcome in Y. Calling the outcomes y,, we have 


(*) Prfy, = ki A+A yn = ky] 
= 2 mo Pion Fnr Pinto! tok * eoi Piy -iint inen 
s 


For example, a subject in a psychological experiment may have different 
probabilities for making responses according to his state of mind. If we 
imagine his frame of mind as the outcome of a Markov chain, then 8 is the 
set of frames of mind and Y is the set of responses. 

If each row vector F,. has all its mass in one entry, then the situation is 
that in lumping. The states in S are lumped in some fashion and the 
lumped states are those in Y. The lumped process need not be a Markov 
chain. Such processes have been studied extensively for a long time. See 
Rosenblatt [1971], Chapter III. 

The opposite extreme occurs when all F,,, P,,, and mare >0. The typical 
practical problem that arises is to estimate the parameters v, P, and F if a 
finite sequence of outcomes is all that is known. Specifically one wants 
values of 7, P, and F that make (*) a maximum, given k,,..., ky. Baum 
et al. [1970] give an iterative procedure in the last paragraph of their paper for 
passing from one set of values of 7, P, and F to another with the property 
that (*) increases to a critical point. Their theorem that (*) increases to a 
critical point has been used in modeling letter patterns in English words, in 
predicting sunspot behavior, and in anticipating the stock market. As 
indicated above, it also has applications to psychology and sociology. 

Optimal stopping problems: Let (yn, F) be a denumerable stochastic 
process, and let z, = 2,(Yo, ¥1;---,Yn) be real-valued and measurable. 
Suppose the x, are integrable. The problem is to find 


V = sup Miz], 


where the supremum is taken over all random times t. V is the value of the 
x, process. Ifthe supremum is attained for some t, t is an optimal strategy, 
and a further problem is to describe t. 

We are to regard the y,’s as some observable outcomes and the z,’s as 
rewards. We are allowed to choose the time of obtaining our reward, 
without clairvoyance, and the problem is to maximize the payoff. There is 
an extensive theory in this generality. See Chow, Robbins, and Siegmund 
[1971]. Neveu [1972a] treats the martingale case, beginning with “Le 
problème de Snell,” solved in Snell [1952 R]. 

In many applications the {y,} are the outcome of a Markov chain, and the 
nth reward function is 2,(Yo,..-, Ya) = f (Yn), with f a function on the state 
space that is independent of n. The value is simply v; = sup, U[f(y,)]. 
If f is bounded, then v is the least nonnegative superregular function >f. 
See Dynkin and Yushkevich [1969], Chapter 3, for a treatment of the problem 
and discussion of strategy. A deeper investigation is the book by Sirjaev 
[1973]. 


Recurrent potential theory: 


Orey [1964 R] developed a recurrent potential theory that avoids the 
notion of a normal chain and proceeds from axioms for a potential operator. 
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Neveu [1972b] developed recurrent potential theory from a different 
perspective, proceeding as follows. For a function h on S with 0 < h < 1, 
define 


co 


Uy = 2 (PD, -P = > P(D, -»P)*, 
n=0 
where D, is the diagonal matrix with h, as ith diagonal entry. 
When h=1, U, = P. When h=0, U,= P+ P? + P+.. In 
the general case with f > 0, 


(af = M| È  = MEDA = Bley)... = Mæn- fæn]; 


n=0 


If we write U, for U, when h is the characteristic function of E, we have 


Ush = M| > ten], 
1snzsig 
When fis 1 on Æ and 0 elsewhere, the right side reduces to the probability 
that the chain started in state 7 ever returns to the set E. 
Let P be recurrent with positive regular measure a. Neveu proves that 
there is some h on S with 0 < h < 1 such that U,h = 1 and U, > 1¢. 
Fix such an A, put V = U, — 1a, and define 


W= > (VD,)*V. 


Clearly W is >0. The finiteness of W is settled by the facts that Wh = c1 

and aD,W = ca, where cis the constant (1 — «h)/(ah). The operator I + W 

is the potential kernel, and Neveu develops an appropriate theory for it. 
See also Revuz [1975]. 


Transient boundary theory: 


Transient boundary theory for genera] denumerable Markov chains stands 
about where it was in 1966. 

Dynkin [1969] gave an account of the theory that does not use extended 
chains. For the theory of the exit boundary the idea is to use a martingale- 
upcrossing argument to deal with a superregular measure p. If f(t) = 
ui[(7N),, the key result is that lim, o f(z,(w)) exists a.e. on infinite paths, 
provided f satisfies a suitable integrability condition. From this result, 
the result we call Theorem 10-18 follows without reference to any extended 
chains, and the rest of the theory requires no change. 

Athreya and Ney [1972] apply transient boundary theory to branching 
processes in Chapter II of their book. 


Sums of independent random variables: 


Sums of independent random variables for a countable group that is not 
necessarily abelian can be defined just as in the abelian case (see Chapter 4), 
as long as left and right are distinguished carefully. Let P be the transition 
matrix for such a chain, and suppose that the states form a single class. 
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Regarding the question of transience vs. recurrence, Kesten [1959] con- 
sidered a symmetric P as a linear operator on square-summable sequences. 
He proved that the group admits such a P with spectral radius one if and 
only if the group is amenable, i.e., there is a non-zero left-invariant positive 
linear functional on the bounded functions on the group. If the spectral 
radius is less than one, then not only does N have finite entries but also N 
is a bounded operator on square-summable sequences; hence recurrence 
implies spectral radius one. Day [1964] removed the hypothesis of symmetry 
in Kesten’s theorem and found further equivalent conditions. His theorem 
has been reproved several times by other authors. 

For the exact question of transience vs. recurrence the results are less 
decisive. In his book Spitzer [1976] settles the case of processes on the 
lattice points in Euclidean space. Dudley [1962] proved that a countable 
abelian group has a recurrent P (with one class of states) if and only if the 
maximum number of linearly independent elements is at most 2. In the 
non-abelian case Kesten [1967] conjectured that the existence of a recurrent 
P for a group is related to the growth in of the number of elements of the 
group expressible as a product of n generators. Milnor [1968a] showed that 
this growth function is approximately independent of the set of generators; 
he pointed out that the existence of a symmetric P with spectral radius less 
than 1 implies exponential growth, and he gave an example of a solvable 
group and a symmetric P with spectral radius 1 and with exponential growth. 
Milnor [1968b] and Wolf [1968] considered classes of countable groups and 
gave conditions under which the growth function is of polynomial size or of 
exponential size. 

Ney and Spitzer [1966] compute the Martin boundary for transient sums 
of independent random variables on a lattice with nonzero mean. Kesten 
and Spitzer [1965] prove the existence of the potential kernel for recurrent 
sums of independent random variables on countable abelian groups, and they 
consider the Martin boundary. Kesten [1967] generalized this work to 
general countable groups, although the extent to which non-trivial non- 
abelian groups can admit recurrent processes is still not known. 

Derriennic [1975] deals with sums of independent random variables on a 
free group with n > 1 generators. It is assumed that the transition matrix 
has only finitely many nonzero entries in each row and that all states com- 
municate. He shows that such a process is transient and that the boundary 
consists of all “reduced infinite words.” The special case in which the process 
in one step can move from a word only to the product of that word by a 
generator or its inverse was considered earlier by other authors. When in 
the special case all the 2n one-step probabilities are equally likely, the 
resulting example is one that arises in the theory of algebraic groups. 
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